Code
library(tidyverse)
library(RMySQL)
library(ggwes)
library(knitr)
library(kableExtra)
library(pander)
library(ggthemes)
library(readxl)In this section, we try to answer our research questions based on the data we have collected for the R programming language. We have already gone over a general EDA, but here we want to characterize the R packages by sectors, organizations/institutions, and countries, and also attribute credit towards the most influential actors by aggregating towards these characterization variables. Also, we’d like to construct a package network to see how packages are linked to each other. Finally, we have a number of impact measures (e.g. additions, reverse dependencies…etc) we will use to identify the most important packages in the R community. A number of impact measures will only be available for the packages we were able to collect GitHub data for (e.g. stars, forks).
library(tidyverse)
library(RMySQL)
library(ggwes)
library(knitr)
library(kableExtra)
library(pander)
library(ggthemes)
library(readxl)cran: Full CRAN Database as of September 2023 with selected metadata
cran_repos: CRAN GitHub repos loaded from database containing repository metrics
cran_users: CRAN GitHub Users data loaded from database containing information like sector/organization
user_commits: CRAN GitHub user commit data containing additions and deletions
user_countries: CRAN User country data cleaned
cran <- read.csv("\\\\westat.com\\DFS\\DVSTAT\\Individual Directories\\Askew\\Paper_Data\\cran.csv")%>%
dplyr::select(-X)
cran_repos <- read.csv("\\\\westat.com\\DFS\\DVSTAT\\Individual Directories\\Askew\\Paper_Data\\cran_repos.csv")%>%
dplyr::select(-X)
cran_users <- read_excel("\\\\westat.com\\DFS\\DVSTAT\\Individual Directories\\Askew\\Paper_Data\\cran_users.xlsx")
cran_users <- cran_users[,-1]
user_commits <- read.csv("\\\\westat.com\\DFS\\DVSTAT\\Individual Directories\\Askew\\Paper_Data\\cran_user_commits.csv")%>%
dplyr::select(-X)
user_countries <- read.csv("\\\\westat.com\\DFS\\DVSTAT\\Individual Directories\\Askew\\Paper_Data\\cran_user_countries.csv")%>%
dplyr::select(-X)In this section, we try to answer our research questions based on the data we have collected for the R programming language. We have already gone over a general EDA, but here we want to characterize the R packages by sectors, organizations/institutions, and countries, and also attribute credit towards the most influential actors by aggregated towards these characterization variables. Also, we’d like to construct a package network to see how packages are linked to each other. Finally, we have a number of impact measures (e.g. additions, reverse dependencies…etc) we will use to identify the most important packages in the R community. A number of impact measures will only be available for the packages we were able to collect GitHub data for (e.g. stars, forks).
Out of 19,852 packages, we were not able to identify a sector for 12,721 of them. For the ones where a sector was found (7131), 6240 were identified as academic, 583 as business, 166 as government, and 142 as nonprofit
## sectors based on packages
pander(table(cran$Sector, useNA = "always"))| Academic | Business | Government | Nonprofit | Unknown | NA |
|---|---|---|---|---|---|
| 6240 | 583 | 166 | 142 | 12721 | 0 |
Out of 10,821 unique maintainers, we were able to identify a sector for 4,014 of them. 3,639 are from the academic sector, 196 from the business sector, 87 from the government sector, and 92 from nonprofit sector
## sectors based on unique maintainers
cran_unique <- cran %>%
distinct(email, .keep_all = T)
pander(table(cran_unique$Sector, useNA = "always"))| Academic | Business | Government | Nonprofit | Unknown | NA |
|---|---|---|---|---|---|
| 3639 | 196 | 87 | 92 | 6807 | 0 |
Based on all CRAN Packages that we were able to extract a sector from, 88% are academic, 8% are business, 2% are government, and 2% are nonprofit. When looking at the unique maintainers, 91% are academic, 5% are business, 2% are government, and 2% are nonprofit.
# Calculate counts by sector (All packages)
cran_sector_counts <- cran %>%
filter(Sector != "Unknown") %>%
count(Sector) %>%
mutate(proportion = n / sum(n),
proportion_label = paste0(round(proportion * 100, 1), "%"))
# Save plot
cran_sector_counts_plot <- ggplot(cran_sector_counts, aes(x = Sector, y = n)) +
geom_bar(stat = "identity", fill = westat_blue()) +
geom_text(aes(label = proportion_label), vjust = -0.3) +
ylab("Count of Packages") +
ylim(c(0, 7000))+
ggtitle(label = "Sector Distribution of All R packages")+
labs(caption = "*64% Unknown for packages (removed from analysis)")+
theme_clean()
cran_sector_counts_plot
# Calculate counts by sector (For unique Maintainers)
cran_sector_counts_unique <- cran %>%
distinct(email, .keep_all = T)%>%
filter(Sector != "Unknown") %>%
count(Sector) %>%
mutate(proportion = n / sum(n),
proportion_label = paste0(round(proportion * 100, 1), "%"))
# Save plot
cran_sector_counts_unique_plot <-ggplot(cran_sector_counts_unique, aes(x = Sector, y = n)) +
geom_bar(stat = "identity", fill = westat_blue()) +
geom_text(aes(label = proportion_label), vjust = -0.3) +
ylab("Count of Maintainers") +
ylim(c(0, 7000))+
ggtitle(label = "Sector Distribution of Unique All R Package Maintainers")+
labs(caption = "62% Unknown for unique maintainers (removed from analysis)")+
theme_clean()
cran_sector_counts_unique_plotBased on all packages, the most frequent institution identified in the maintainer email domains is Rstudio followed by Harvard University. However, if we base it on unique maintainer email domains, then Harvard becomes most frequently identified institution, followed by Rstudio. It appears that a lot of the packages developed from Rstudio domains are the same ones.
### sorting to the top 10 most common institutions for packages
top10_Institutions <- sort(table(cran$Institution), decreasing = T)
top10_Institutions <- as.data.frame(head(top10_Institutions, 10))
colnames(top10_Institutions) <- c("Institution", "Freq")
### joining to institution dataframe to get sector variable
top10_Institutions <- cran %>%
right_join(top10_Institutions, by = "Institution")%>%
distinct(Institution, .keep_all = T)%>%
select(Institution, Sector, Freq)%>%
arrange(desc(Freq))
### sorting to the top 10 most common institutions for distinct maintainers
top10_Institutions_unique <- sort(table(cran_unique$Institution), decreasing = T)
top10_Institutions_unique <- as.data.frame(head(top10_Institutions_unique, 10))
colnames(top10_Institutions_unique) <- c("Institution", "Freq")
### joining to institution unique dataframe to get sector variable
top10_Institutions_unique <- cran %>%
right_join(top10_Institutions_unique, by = "Institution")%>%
distinct(Institution, .keep_all = T)%>%
select(Institution, Sector, Freq)%>%
arrange(desc(Freq))
### Graph output of top 10 institutions for packages
ggplot(top10_Institutions, aes(x = reorder(Institution, Freq), y = Freq, fill = Sector))+
geom_bar(stat = "identity") +
coord_flip() +
scale_y_continuous(expand = c(0,0)) +
labs(x = "", y = "Number of Packages",
title = "Top 10 Institutions for All R Packages" ) +
ylim(c(0, 350))+
scale_fill_westat(option = "BLUES", drop = FALSE)+
theme_clean()
### Graph output of top 10 institutions for unique maintainers
ggplot(top10_Institutions_unique, aes(x = reorder(Institution, Freq), y = Freq, fill = Sector))+
geom_bar(stat = "identity") +
coord_flip() +
scale_y_continuous(expand = c(0,0)) +
labs(x = "", y = "Number of Maintainers",
title = "Top 10 Institutions for Unique Maintainers" ) +
ylim(c(0, 350))+
scale_fill_westat(option = "BLUES", drop = FALSE)+
theme_clean()### Table output of top 10 Institutions for packages
top10_Institutions %>%
kbl(caption = "Most Frequent Institutions for Packages", escape = F)%>%
kable_classic()%>%
kable_styling(font_size = 12, full_width = T)%>%
row_spec(0, bold = T, background = westat_blue(), color = "white")%>%
column_spec(1:2, border_right = T)%>%
scroll_box()| Institution | Sector | Freq |
|---|---|---|
| RStudio | Business | 329 |
| Harvard University | Academic | 146 |
| University of California-Berkeley | Academic | 103 |
| NetEase | Business | 98 |
| University of Washington-Seattle Campus | Academic | 91 |
| University of Michigan-Ann Arbor | Academic | 91 |
| University of Minnesota-Twin Cities | Academic | 84 |
| Stanford University | Academic | 84 |
| University of Wisconsin-Madison | Academic | 79 |
| University of Auckland | Academic | 78 |
### Table output of top 10 Institutions for unique maintainers
top10_Institutions_unique %>%
kbl(caption = "Most Frequent Institutions for Unique Maintainers", escape = F)%>%
kable_classic()%>%
kable_styling(font_size = 12, full_width = T)%>%
row_spec(0, bold = T, background = westat_blue(), color = "white")%>%
column_spec(1:2, border_right = T)%>%
scroll_box()| Institution | Sector | Freq |
|---|---|---|
| Harvard University | Academic | 71 |
| NetEase | Business | 57 |
| University of Washington-Seattle Campus | Academic | 55 |
| University of Michigan-Ann Arbor | Academic | 54 |
| RStudio | Business | 49 |
| University of Minnesota-Twin Cities | Academic | 46 |
| University of California-Berkeley | Academic | 39 |
| Stanford University | Academic | 36 |
| University of Wisconsin-Madison | Academic | 34 |
| Business | 32 |
As stated in the introduction, we also collected data from GitHub for all R packages that we were able to identify with a repository. Github provides us with more data including repository statistics and data at the contributor level, which would be each individual that is a collaborator on a given repository. We can now look at distributions at both the maintainer and contributor levels to compare. For now, we’ll still just be looking at the package level, meaning the maintainer level information of the packages.
After linking to GitHub, we are able to identify repository data for 7,844 out of the 19,852 packages on CRAN
We first have to extract the slug from all packages that have a GitHub URL
#### filtering for URLs that only contain github.com in the link
cran_github <- cran %>% filter(grepl("https://github.com", URL, ignore.case = TRUE))
### extracting the URL portion with the slug
cran_github <- cran_github %>%
mutate(URL = str_extract(URL, "https://github.com/([^/]+)/([^/]+)"))
### removing commas
cran_github <- cran_github %>%
mutate(URL = sub(",.*$", "", URL))
### extracting slug portion
cran_github <- cran_github %>%
mutate(slug = str_extract(URL, "(?<=github.com/)[^/]+/[^/]+"))
cran_github <- cran_github %>%
mutate(slug = str_extract(slug, "[^\\s]+/[^\\s]+"))We can now join the original cran dataframe to the repositories we collected data for
### creating slug for linkage
cran_repos <- cran_repos %>%
mutate(slug = paste(owner, repo, sep = "/"))
### join to cran by Package for more data
cran_repos <- cran_github %>%
left_join(cran_repos, by = "slug")%>%
distinct(slug, .keep_all = T)
### create "year_created" variable
cran_repos$year_created <- substr(cran_repos$created_at, 1, 4)Out of 7,844 packages identified on GitHub, we were able to identify a sector for 2379 of them. For the ones where a sector was found, 1858 were identified as academic, 385 as business, 70 as government, and 66 as nonprofit
pander(table(cran_repos$Sector, useNA = "always"))| Academic | Business | Government | Nonprofit | Unknown | NA |
|---|---|---|---|---|---|
| 1858 | 385 | 70 | 66 | 5465 | 0 |
Out of 4267 unique maintainers identified on GitHub, we were able to identify a sector for 1322 of them. 1132 were identified as academic, 109 as business, 39 as government, and 42 as nonprofit
## sectors based on unique maintainers
cran_repos_unique <- cran_repos %>%
distinct(email, .keep_all = T)
pander(table(cran_repos_unique$Sector, useNA = "always"))| Academic | Business | Government | Nonprofit | Unknown | NA |
|---|---|---|---|---|---|
| 1132 | 109 | 39 | 42 | 2945 | 0 |
Based on all GitHub R Packages that we were able to extract a sector from, 78% are academic, 16% are business, 3% are government, and 3% are nonprfoit. When looking at the unique maintainers, 86% are academic, 8% are business, 3% are government, and 3% are nonprofit.
# Calculate counts by sector (All packages on GitHub)
cran_repo_sector_counts <- cran_repos %>%
filter(Sector != "Unknown") %>%
count(Sector) %>%
mutate(proportion = n / sum(n),
proportion_label = paste0(round(proportion * 100, 1), "%"))
# Save plot
cran_repo_sector_counts_plot <- ggplot(cran_repo_sector_counts, aes(x = Sector, y = n)) +
geom_bar(stat = "identity", fill = westat_blue()) +
geom_text(aes(label = proportion_label), vjust = -0.3) +
ylab("Count of Packages") +
ylim(c(0, 2000))+
ggtitle(label = "Number of R Packages on GitHub by Maintainer's Sector")+
labs(caption = "*70% Unknown for packages (removed from analysis)")+
theme_clean()
cran_repo_sector_counts_plot
# Calculate counts by sector (For unique Maintainers on GitHub)
cran_repo_sector_counts_unique <- cran_repos_unique %>%
distinct(email, .keep_all = T)%>%
filter(Sector != "Unknown") %>%
count(Sector) %>%
mutate(proportion = n / sum(n),
proportion_label = paste0(round(proportion * 100, 1), "%"))
# Save plot
cran_repo_sector_counts_unique_plot <-ggplot(cran_repo_sector_counts_unique, aes(x = Sector, y = n)) +
geom_bar(stat = "identity", fill = westat_blue()) +
geom_text(aes(label = proportion_label), vjust = -0.3) +
ylab("Count of Maintainers") +
ylim(c(0, 2000))+
ggtitle(label = "Number of R Package Maintainers on GitHub by Sector")+
labs(caption = "*69% Unknown for unique maintainers (removed from analysis)")+
theme_clean()
cran_repo_sector_counts_unique_plotFor number of packages overall, Rstudio develops the most R packages on Github by a good margin. However, if we look at the unique maintainers only, the spread between Rstudio and other institutions becomes much smaller. It seems that their are a few maintainers that develop a lot of the R packages. We also note that those who do not have a sector will also not have an institution label (these coincide with one another).
### sorting to the top 10 most common institutions for packages
top10_Institutions_GitHub <- sort(table(cran_repos$Institution), decreasing = T)
top10_Institutions_GitHub <- as.data.frame(head(top10_Institutions_GitHub, 10))
colnames(top10_Institutions_GitHub) <- c("Institution", "Freq")
### joining to institution dataframe to get sector variable
top10_Institutions_GitHub <- cran_repos %>%
right_join(top10_Institutions_GitHub, by = "Institution")%>%
distinct(Institution, .keep_all = T)%>%
select(Institution, Sector, Freq)%>%
arrange(desc(Freq))
### sorting to the top 10 most common institutions for distinct maintainers
top10_Institutions_GitHub_unique <- sort(table(cran_repos_unique$Institution), decreasing = T)
top10_Institutions_GitHub_unique <- as.data.frame(head(top10_Institutions_GitHub_unique, 10))
colnames(top10_Institutions_GitHub_unique) <- c("Institution", "Freq")
### joining to institution unique dataframe to get sector variable
top10_Institutions_GitHub_unique <- cran_repos_unique %>%
right_join(top10_Institutions_GitHub_unique, by = "Institution")%>%
distinct(Institution, .keep_all = T)%>%
select(Institution, Sector, Freq)%>%
arrange(desc(Freq))
### Graph output of top 10 institutions for packages
ggplot(top10_Institutions_GitHub, aes(x = reorder(Institution, Freq), y = Freq, fill = Sector))+
geom_bar(stat = "identity") +
coord_flip() +
scale_y_continuous(expand = c(0,0)) +
labs(x = "", y = "Number of Packages",
title = "Top 10 Institutions for R Packages on GitHub" ) +
ylim(c(0, 300))+
scale_fill_westat(option = "BLUES", drop = FALSE)+
theme_clean()+
theme(
plot.title = element_text(size = 13))
### Graph output of top 10 institutions for unique maintainers
ggplot(top10_Institutions_GitHub_unique, aes(x = reorder(Institution, Freq), y = Freq, fill = Sector))+
geom_bar(stat = "identity") +
coord_flip() +
scale_y_continuous(expand = c(0,0)) +
labs(x = "", y = "Number of Maintainers",
title = "Top 10 Institutions for Unique Maintainers on GitHub" ) +
ylim(c(0, 300))+
scale_fill_westat(option = "BLUES", drop = FALSE)+
theme_clean()+
theme(
plot.title = element_text(size = 13))### Table output of top 10 Institutions for packages
top10_Institutions_GitHub %>%
kbl(caption = "Most Frequent Institutions for Packages on GitHub", escape = F)%>%
kable_classic()%>%
kable_styling(font_size = 12, full_width = T)%>%
row_spec(0, bold = T, background = westat_blue(), color = "white")%>%
column_spec(1:2, border_right = T)%>%
scroll_box()| Institution | Sector | Freq |
|---|---|---|
| RStudio | Business | 278 |
| University of California-Berkeley | Academic | 53 |
| Harvard University | Academic | 51 |
| NetEase | Business | 38 |
| University of Wisconsin-Madison | Academic | 36 |
| University of Oslo | Academic | 35 |
| University of Michigan-Ann Arbor | Academic | 33 |
| University College London | Academic | 32 |
| University of Alberta | Academic | 28 |
| French National Centre for Scientific Research | Government | 25 |
### Table output of top 10 Institutions for unique maintainers
top10_Institutions_GitHub_unique %>%
kbl(caption = "Most Frequent Institutions for Unique Maintainers on GitHub", escape = F)%>%
kable_classic()%>%
kable_styling(font_size = 12, full_width = T)%>%
row_spec(0, bold = T, background = westat_blue(), color = "white")%>%
column_spec(1:2, border_right = T)%>%
scroll_box()| Institution | Sector | Freq |
|---|---|---|
| RStudio | Business | 45 |
| Harvard University | Academic | 24 |
| University of Michigan-Ann Arbor | Academic | 22 |
| NetEase | Business | 22 |
| University of Washington-Seattle Campus | Academic | 19 |
| University of California-Berkeley | Academic | 14 |
| University College London | Academic | 14 |
| University of Wisconsin-Madison | Academic | 14 |
| Copenhagen University | Academic | 14 |
| University of Oslo | Academic | 12 |
We can identify the year created by looking at the date and time the repository was created on github. This is one of the variables we collected during GitHub data collection.
We can now see how the distribution of sectors is changing over time and also identify patterns in years where we were able to identify the most sectors. We do the same type of analysis, one for sectors of all R packages on GitHub and one for sectors of all unique R maintainers on GitHub .
It looks like the ability to identify a sector generally increases from year to year all the way up until 2020, where there is a dip in the number of packages and maintainers being registered on GitHub. As for the sector distribution, it essentially stays the same from year to year for both plots. Academic makes a majority of the distribution, while there are slight fluctuations in the other sectors.
cran_repos_time <- cran_repos %>%
filter(Sector != "Unknown" & year_created != "NA" ) %>%
ggplot(aes(x = as.factor(year_created), fill = Sector)) +
geom_bar() +
labs(
x = "Year",
y = "Number of Packages",
title = "Change in Sectors Over Time for R Packages on GitHub"
) +
theme_clean()+
scale_fill_westat(option = "BLUES", drop = FALSE)+
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 8))+
ylim(c(0, 300))
cran_repos_time
cran_repos_unique_time <- cran_repos_unique %>%
filter(Sector != "Unknown" & year_created != "NA" ) %>%
ggplot(aes(x = as.factor(year_created), fill = Sector)) +
geom_bar() +
labs(
x = "Year",
y = "Number of Maintainers",
title = "Change in Sectors Over Time for Unique R Maintainers on GitHub"
) +
theme_clean()+
scale_fill_westat(option = "BLUES", drop = FALSE)+
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 8))+
ylim(c(0, 300))
cran_repos_unique_timeNow we look at distributions of all R contributors on GitHub. After GitHub data collection, we were able to identify 14,328 unique R contributors.
cran_users_unique <- cran_users %>%
distinct(login, .keep_all = T)
nrow(cran_users_unique)[1] 14328
We also collected commit data for each of the unique R contributors. We join this back with our unique R contributors dataframe to combine commit, sector, country, and organization variables.
### summing up total commits for all unique users of unique repos
user_commits_total <- user_commits %>%
group_by(slug, login) %>%
summarise(total_additions = sum(additions)) %>%
ungroup()
### join back to unique users dataframe for other variables
user_commits_total <- user_commits_total %>%
left_join(cran_users_unique, by = "login") %>%
select(slug, login,name, email, total_additions, organization, sector, country)
cran_repos2 <- cran_repos %>%
select(slug, year_created, stargazer_count, fork_count, Downloads_All_Time, Downloads_Normalized, Reverse_Depends_Count)
user_commits_total <- user_commits_total %>%
left_join(cran_repos2, by = "slug")
### Rename NA sectors to Unknown
user_commits_total <- user_commits_total %>%
mutate(sector = ifelse(is.na(sector) | sector == "Unknown", "Unknown", sector))For the 14,328 unique R contributors on GitHub, we were able to identify a sector for 2,573 of them. 1870 coming from academic, 482 from business, 84 from government, and 137 from nonprofit
pander(table(cran_users_unique$sector, useNA = "always"))| Academic | Business | Government | Nonprofit | Unknown | NA |
|---|---|---|---|---|---|
| 1870 | 482 | 84 | 137 | 11755 | 0 |
For unique R developers (contributors to a slug) on GitHub, 73% are identified as academic, 19% as business, 5% as nonprofit, and 3% as government.
# Calculate counts by sector (All packages on GitHub)
cran_user_sector_counts <- cran_users_unique %>%
filter(sector != "NA" & sector != "Unknown") %>%
count(sector) %>%
mutate(proportion = n / sum(n),
proportion_label = paste0(round(proportion * 100, 1), "%")) %>%
arrange(desc(proportion)) %>%
mutate(sector = factor(sector, levels = unique(sector)))
# Save plot
cran_user_sector_counts_plot <- ggplot(cran_user_sector_counts, aes(x = sector, y = n)) +
geom_bar(stat = "identity", fill = westat_blue()) +
geom_text(aes(label = proportion_label), vjust = -0.3) +
ylab("Count of Developers") +
ylim(c(0, 2000))+
ggtitle(label = "Number of R Package Developers on GitHub by Sector")+
labs(caption = "*Developers without sector information are removed in this figure (82% of 14,328 R Developers)")+
theme_clean()
cran_user_sector_counts_plotWe now aim to try to attribute contribution to sectors with a couple of methods. First, we look at equal contribution, where each member of a repository is given an equal fraction of credit regardless of level of contribution. So, if a repository has five members, each member will get .2 credit, and then the fractions are aggregated to the sectors. We will count the fraction to unknown sectors as well, but we will remove it in any graphical displays, as we already know this will be the highest percentage.
Note: This is different than looking at unique user distribution, as it will count repeat users if they are members of multiple repositories
# 1. Count the number of unique login per slug.
login_counts <- user_commits_total %>%
group_by(slug) %>%
summarise(num_logins = n_distinct(login))
# 2. Compute the contribution fraction for each login.
user_commits_total <- user_commits_total %>%
left_join(login_counts, by = "slug") %>%
mutate(contribution_fraction_equal = 1 / num_logins) %>%
select(-num_logins) # Removing the num_logins column as it's no longer needed
# 3. Sum the contribution fraction for each sector per slug.
sector_contribution <- user_commits_total %>%
group_by(slug, sector) %>%
summarise(total_contribution_fraction = sum(contribution_fraction_equal))
# 4. Aggregate the contribution fraction for each sector across all slugs.
sector_aggregated <- sector_contribution %>%
group_by(sector) %>%
summarise(overall_contribution_fraction = sum(total_contribution_fraction))
# Calculate the total overall contribution fraction over all sectors
total_overall_contribution = sum(sector_aggregated$overall_contribution_fraction)
# Calculate the percentage contribution for each sector
sector_aggregated = sector_aggregated %>%
mutate(percentage_contribution = round((overall_contribution_fraction / total_overall_contribution) * 100, 1))
### Plot percentage contribution
sector_aggregated$percentage_label <- scales::percent(sector_aggregated$percentage_contribution / 100)Based on equal contribution of each unique login to each unique repository, we would attribute 80% of credit to the academic sector, 15% to the business, 2% to the government, and 3% to the nonprofit. Note that we removed Unknown from the distribution, where we would have to attribute 78% to. So, the percentage distributions listed here are based on the percentage we do know.
### Excluding the unknown percentage in the table
total_excluding_unknown <- sum(sector_aggregated$overall_contribution_fraction[sector_aggregated$sector != "Unknown"])
### recalculating what percentages would be without unknown
sector_aggregated <- sector_aggregated %>%
mutate(percentage_contribution_excl_unknown = ifelse(sector != "Unknown",
round((overall_contribution_fraction / total_excluding_unknown) * 100, 1), NA_real_))
### making labels
sector_aggregated$percentage_label_excl_unknown <- scales::percent(sector_aggregated$percentage_contribution_excl_unknown / 100, accuracy = 0.1)
ggplot(sector_aggregated %>% filter(sector != "Unknown"), aes(x = sector, y = percentage_contribution_excl_unknown)) +
geom_bar(stat = "identity", fill = westat_blue()) +
geom_text(aes(label = percentage_label_excl_unknown), vjust = -0.5, size = 4) +
geom_text(aes(label = paste0("(", round(overall_contribution_fraction, 2), ")")), position = position_dodge(width = 0.9), vjust = -2.5)+
labs(title = "Percentage Contribution by Sector (Equal)",
x = "Sector",
y = "Percentage Contribution") +
theme_clean() +
labs(caption = "*Excludes the percentage contribution from unknown sector (77.7%)")+
ylim(0,100)We also can attribute contribution to sectors based on the lines of code written for a unique user of a given repository. The more lines of code added for that repository, the more credit that user will get. So, if a repository has 500 total lines of code, and one user wrote 300 of them, he/she would get .6 of the credit. We again apply the fractional counting method to the sectors after calculating this.
# Calculate the total code additions for each slug (project/repository identifier)
# Grouping by the slug, and then summarizing the total additions for each slug.
slug_totals <- user_commits_total %>%
group_by(slug) %>%
summarise(total_code_for_slug = sum(total_additions))
# Compute the contribution fraction for each user.
# This is done by joining the user's total additions with the total code additions for their respective slug,
# and then computing the user's contribution as a fraction of the slug's total.
user_commits_total <- user_commits_total %>%
left_join(slug_totals, by = "slug") %>%
mutate(contribution_fraction_loc = total_additions / total_code_for_slug)
# Compute the total contribution fraction for each combination of slug and sector.
# This groups the data by slug and sector, and then sums up the contribution fractions.
sector_addition_contribution <- user_commits_total %>%
group_by(slug, sector) %>%
summarise(total_addition_contribution = sum(contribution_fraction_loc))
# Aggregate the contributions at the sector level.
# This groups by the sector and then computes the overall contribution fraction for each sector.
sector_aggregated_additions <- sector_addition_contribution %>%
group_by(sector) %>%
summarise(overall_addition_contribution = sum(total_addition_contribution, na.rm = TRUE))
# Compute the total overall additions across all sectors.
total_overall_additions = sum(sector_aggregated_additions$overall_addition_contribution)
# Calculate the percentage of additions for each sector relative to the total overall additions.
sector_aggregated_additions$percentage_additions = round((sector_aggregated_additions$overall_addition_contribution / total_overall_additions) * 100,1)
# Create a label for the percentage values, turning the decimal fraction into a percentage string (e.g., 0.5 becomes "50%").
sector_aggregated_additions$percentage_label_additions = scales::percent(sector_aggregated_additions$percentage_additions / 100)After doing these calculations, we now see that 83% can be attributed to the academic sector, 13% to the business, 2% to the government, and 2% to the nonprofit. The original amount attributed to Unknown decreased to 75.6%
# Calculate the total code additions while excluding the 'Unknown' sector.
total_excluding_unknown_add <- sum(sector_aggregated_additions$overall_addition_contribution[sector_aggregated_additions$sector != "Unknown"])
# Compute the percentage contribution for each sector relative to the total (excluding 'Unknown' sector).
# If the sector is 'Unknown', set the percentage as NA.
sector_aggregated_additions <- sector_aggregated_additions %>%
mutate(percentage_contribution_excl_unknown = ifelse(sector != "Unknown",
round((overall_addition_contribution / total_excluding_unknown_add) * 100, 1), NA_real_))
# Create a label for the percentage values that excludes 'Unknown' sector, turning the decimal fraction into a percentage string.
sector_aggregated_additions$percentage_label_excl_unknown <- scales::percent(sector_aggregated_additions$percentage_contribution_excl_unknown / 100, accuracy = 0.1)
# Visualize data
ggplot(sector_aggregated_additions %>% filter(sector != "Unknown"), aes(x = sector, y = percentage_contribution_excl_unknown)) +
geom_bar(stat = "identity", fill = westat_blue()) +
geom_text(aes(label = percentage_label_excl_unknown), vjust = -0.5, size = 4) + # Adjust vjust and size as needed
geom_text(aes(label = paste0("(", round(overall_addition_contribution, 2), ")")), position = position_dodge(width = 0.9), vjust = -2.5)+
labs(#title = "Percentage Contribution by Sector (Weighted)",
x = "Sector",
y = "Percentage Contribution") +
theme_clean() +
#labs(caption = "*Excludes the percentage contribution from unknown sector (75.6%)")+
ylim(0,100)+
theme(axis.text = element_text(size = 14),
axis.title = element_text(size = 12))# Compute the total contribution fraction for each combination of slug, sector, and year
sector_addition_contribution_time <- user_commits_total %>%
group_by(slug, sector, year_created) %>%
summarise(total_addition_contribution = sum(contribution_fraction_loc), .groups = 'drop')
# Aggregate the contributions at the sector and year level
sector_aggregated_additions_time <- sector_addition_contribution_time %>%
group_by(sector, year_created) %>%
summarise(overall_addition_contribution = sum(total_addition_contribution, na.rm = TRUE), .groups = 'drop')
# Compute the total overall additions across all sectors by year
total_overall_additions_by_year <- sector_aggregated_additions_time %>%
group_by(year_created) %>%
summarise(yearly_total = sum(overall_addition_contribution), .groups = 'drop')
# Calculate the percentage of additions for each sector relative to the total overall additions for each year
sector_aggregated_additions_time <- sector_aggregated_additions_time %>%
left_join(total_overall_additions_by_year, by = "year_created") %>%
mutate(percentage_additions = (overall_addition_contribution / yearly_total) * 100)
# Calculate the total code additions for each year while excluding the 'Unknown' sector
total_excluding_unknown_by_year <- sector_aggregated_additions_time %>%
filter(sector != "Unknown") %>%
group_by(year_created) %>%
summarise(yearly_total_excl_unknown = sum(overall_addition_contribution), .groups = 'drop')
# Compute the percentage contribution for each sector by year relative to the year's total excluding 'Unknown'
sector_aggregated_additions_time <- sector_aggregated_additions_time %>%
left_join(total_excluding_unknown_by_year, by = "year_created") %>%
mutate(percentage_contribution_excl_unknown = ifelse(sector != "Unknown" & !is.na(yearly_total_excl_unknown),
(overall_addition_contribution / yearly_total_excl_unknown) * 100,
NA_real_))
# Round the percentages and create labels
sector_aggregated_additions_time$percentage_contribution_excl_unknown <- round(sector_aggregated_additions_time$percentage_contribution_excl_unknown, 1)
sector_aggregated_additions_time$percentage_label_excl_unknown <- ifelse(is.na(sector_aggregated_additions_time$percentage_contribution_excl_unknown),
NA_character_,
percent(sector_aggregated_additions_time$percentage_contribution_excl_unknown / 100))
# Assuming sector_aggregated_additions contains the necessary processed data
# Filter out the 'Unknown' sector for plotting
plot_data <- sector_aggregated_additions_time %>%
filter(sector != "Unknown",
year_created != "NA" & year_created != "2023")The following graph shows fractional credit for sectors over time.
# Stacked Bar Chart for Yearly Totals
R_Sectors_time <- ggplot(plot_data, aes(x = year_created, y = overall_addition_contribution, fill = sector)) +
geom_bar(stat = "identity") +
scale_fill_westat(option = "BLUES", drop = FALSE)+
labs(x = "", y = "Fractional Count of Packages", title = "Fractional Count of Packages for Sector by Year") + # Fractional Count of Packages for Sector by Year, y-axis: Fractional Count of Packages
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "bottom")
R_Sectors_time# ggsave(filename = "\\\\westat.com\\dfs\\DVSTAT\\Individual Directories\\Askew\\Paper_Data\\New Graphs\\R_Sectors_time.png", plot = R_Sectors_time, width = 8, height = 6, dpi = 300)# Line Chart for Percentages by Sector (excluding 'Unknown')
ggplot(plot_data,
aes(x = year_created, y = percentage_contribution_excl_unknown, color = sector, group = sector)) +
geom_line() +
geom_point() +
labs(x = "", y = "Percentage of Total Packages", title = "Weighted Sector Contribution by Year") +
scale_color_westat(option = "BLUES", drop = FALSE) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom")# Create the stacked bar plot
ggplot(plot_data, aes(x = year_created, y = percentage_contribution_excl_unknown, fill = sector)) +
geom_bar(stat = "identity") +
scale_fill_westat(option = "BLUES", drop = FALSE) +
labs(x = "", y = "Percentage Contribution", title = "Weighted Sector Contribution by Year") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1), # Adjust the angle of the x-axis labels for readability
legend.position = "bottom") # Place the legend at the bottomThe diverstidy function, which we use to extract country from a user, can supply some messy data in terms of identifying multiple countries for a unique user. We first need to clean that up before analyzing country distributions. There were 427 unique users that had multiple countries supplied, so we manually went through and decided whether all countries should be kept, or some should be deleted. The country extracted can be based on email, location, company, or an organization that a given user has listed.
We filter out NA values here and replace with “Unknown”
cran_users_unique <- cran_users_unique %>%
mutate(
country_fixed = strsplit(as.character(country), split = "\\|") %>% # Split on "|"
map(~unique(.)) %>% # Keep only unique values
sapply(paste, collapse = ",") # Collapse back into a string
)
cran_users_unique <- cran_users_unique %>%
mutate(country_fixed = ifelse(country_fixed == "NA", NA_character_, country_fixed))
cran_users_unique <- cran_users_unique %>%
mutate(
country_fixed = strsplit(country_fixed, split = ",") %>% # Split on comma
map(~ .[!. %in% "NA"]) %>% # Remove "NA" values (note the space before "NA")
sapply(paste, collapse = ",") # Collapse back into a string
)
cran_users_unique <- cran_users_unique %>%
left_join(user_countries, by = "login")
cran_users_unique <- cran_users_unique %>%
mutate(country_final = ifelse(is.na(country_final), country_fixed, country_final))
cran_users_unique <- cran_users_unique %>%
mutate(country_final = ifelse(is.na(country_final) | country_final == "NA", "Unknown", country_final))Based on the unique R GitHub users, the United States is the most frequent country found followed by Germany and the United Kingdom. Out of 14,328 unique users, there were 5575 that we were unable to find a country for.
### sum of Unknowns for country
sum(cran_users_unique$country_final == "Unknown")[1] 5575
### sorting to the top 10 most common countries for distinct GitHub users
top10_Countries_GitHub_users_unique <- cran_users_unique %>%
filter(country_final != "Unknown")
top10_Countries_GitHub_users_unique <- sort(table(top10_Countries_GitHub_users_unique$country_final), decreasing = T)
top10_Countries_GitHub_users_unique <- as.data.frame(head(top10_Countries_GitHub_users_unique , 10))
colnames(top10_Countries_GitHub_users_unique ) <- c("country_final", "Freq")
### Graph output of top 10 countries for unique maintainers
ggplot(top10_Countries_GitHub_users_unique , aes(x = reorder(country_final, Freq), y = Freq))+
geom_bar(stat = "identity", fill = westat_blue()) +
coord_flip() +
scale_y_continuous(expand = c(0,0)) +
labs(x = "", y = "Number of GitHub Users",
title = "Top 10 Countries for R Users on GitHub" ) +
ylim(c(0, 3000))+
scale_fill_westat(option = "BLUES")+
theme_clean()+
theme(
plot.title = element_text(size = 13))+
labs(caption = "*Excludes count from unknown countries (5575)")### Table output of top 10 Institutions for packages
top10_Countries_GitHub_users_unique %>%
kbl(caption = "Most Frequent Countries for R Developers on GitHub", escape = F)%>%
kable_classic()%>%
kable_styling(font_size = 12, full_width = T)%>%
row_spec(0, bold = T, background = westat_blue(), color = "white")%>%
column_spec(1:2, border_right = T)%>%
scroll_box()| country_final | Freq |
|---|---|
| United States | 2809 |
| Germany | 854 |
| United Kingdom | 660 |
| Canada | 410 |
| France | 352 |
| Australia | 322 |
| China | 286 |
| Netherlands | 264 |
| Switzerland | 226 |
| India | 214 |
As stated prior, there are some logins that have multiple countries listed. For these logins, we split the contribution fractions for equal and lines of code equally among the countries. So, if a user had two countries in a slug with 4 unique users, each country will get .125 credit based on equal contribution. For lines of code, if that user had 500 additions, each country would get 250 additions. After doing this, we see that there are 123 unique countries identified.
# Function to handle the splitting and division for multiple countries
process_multiple_countries <- function(df) {
num_countries <- length(str_split(df$country_final, ",\\s*")[[1]])
df %>%
separate_rows(country_final, sep = ",\\s*") %>%
mutate(
total_additions = total_additions / num_countries,
contribution_fraction_equal = contribution_fraction_equal / num_countries,
contribution_fraction_loc = contribution_fraction_loc / num_countries
)
}
# join country variable back to commit table
user_countries <- cran_users_unique %>%
select(login, country_final)
user_commits_total <- user_commits_total %>%
left_join(user_countries, by = "login")
# Replace NA values in 'country_final' with 'Unknown'
user_commits_total$country_final[is.na(user_commits_total$country_final)] <- "Unknown"
# Process rows with multiple countries
multi_country_rows <- user_commits_total %>%
filter(str_detect(country_final, ",")) %>%
group_by(login) %>%
do(process_multiple_countries(.))
# Exclude multi-country rows from the original df and bind the processed rows
user_commits_total <- user_commits_total %>%
filter(!str_detect(country_final, ",")) %>%
bind_rows(multi_country_rows)Instead of grouping by sector, we have to group by country here.
# Sum the contribution fraction for each sector per slug.
country_contribution <- user_commits_total %>%
group_by(slug, country_final) %>%
summarise(total_contribution_fraction = sum(contribution_fraction_equal))
# Aggregate the contribution fraction for each country across all slugs
country_aggregated <- country_contribution %>%
group_by(country_final) %>%
summarise(overall_contribution_fraction = sum(total_contribution_fraction))
# Calculate the total overall contribution fraction over all countries
total_overall_contribution = sum(country_aggregated$overall_contribution_fraction)
# Calculate the percentage contribution for each country
country_aggregated = country_aggregated %>%
mutate(percentage_contribution = round((overall_contribution_fraction / total_overall_contribution) * 100, 1))
### Plot percentage contribution
country_aggregated$percentage_label <- scales::percent(country_aggregated$percentage_contribution / 100)If we give equal contributions to countries, then the United states would get 31.1% of the credit followed by Germany with 10.9% credit. This excludes the contribution counted towards unknown (38.5%), so these percentages are based on the percentage that we know (61.5%).
total_excluding_unknown <- sum(country_aggregated$overall_contribution_fraction[country_aggregated$country_final != "Unknown"])
country_aggregated <- country_aggregated %>%
mutate(percentage_contribution_excl_unknown = ifelse(country_final != "Unknown",
round((overall_contribution_fraction / total_excluding_unknown) * 100, 1), NA_real_))
country_aggregated$percentage_label_excl_unknown <- scales::percent(country_aggregated$percentage_contribution_excl_unknown / 100, accuracy = 0.1)
top_10_countries <- country_aggregated %>%
arrange(desc(percentage_contribution_excl_unknown)) %>%
head(10)
ggplot(top_10_countries, aes(x = reorder(country_final, percentage_contribution_excl_unknown), y = percentage_contribution_excl_unknown)) +
geom_bar(stat = "identity", fill = westat_blue()) +
geom_text(aes(label = percentage_label_excl_unknown), vjust = .5, size = 4, hjust = -.25) +
geom_text(aes(label = paste0("(", round(overall_contribution_fraction, 2), ")")), position = position_dodge(width = 0.9), vjust = .25, hjust = -1)+# Adjust vjust and size as needed
labs(title = "Percentage Contribution by Country (Equal - Top 10 Countries)",
x = "Country",
y = "Percentage Contribution") +
theme_clean() +
ylim(0,100)+
coord_flip()+
theme(plot.title = element_text(size = 10))+
labs(caption = "*Excludes the percentage contribution from unknown countries (38.5%)")Now, we base the contribution on additions for country just as we did for sector.
country_addition_contribution <- user_commits_total %>%
group_by(slug, country_final) %>%
summarise(total_addition_contribution = sum(contribution_fraction_loc))
country_aggregated_additions <- country_addition_contribution %>%
group_by(country_final) %>%
summarise(overall_addition_contribution = sum(total_addition_contribution, na.rm = TRUE))
total_overall_additions = sum(country_aggregated_additions$overall_addition_contribution)
country_aggregated_additions$percentage_additions = round((country_aggregated_additions$overall_addition_contribution / total_overall_additions) * 100,1)
country_aggregated_additions$percentage_label_additions = scales::percent(country_aggregated_additions$percentage_additions / 100)Based on additions, the percentage attributed towards unknwon decreases to 34.1%, so the percentage that we know increases to 65.9% overall. United states still is at the top, but it decreases slightly to 30.9%. The top 10 and the order of the top 10 stays the same, but the percentages increase slightly for the ones more towards the bottom.
total_excluding_unknown <- sum(country_aggregated_additions$overall_addition_contribution[country_aggregated_additions$country_final != "Unknown"])
country_aggregated_additions <- country_aggregated_additions %>%
mutate(percentage_contribution_excl_unknown = ifelse(country_final != "Unknown",
round((overall_addition_contribution / total_excluding_unknown) * 100, 1), NA_real_))
country_aggregated_additions$percentage_label_excl_unknown <- scales::percent(country_aggregated_additions$percentage_contribution_excl_unknown / 100, accuracy = 0.1)
top_10_countries_additions <- country_aggregated_additions %>%
arrange(desc(percentage_contribution_excl_unknown)) %>%
head(10)
ggplot(top_10_countries_additions, aes(x = reorder(country_final, percentage_contribution_excl_unknown), y = percentage_contribution_excl_unknown)) +
geom_bar(stat = "identity", fill = westat_blue()) +
geom_text(aes(label = percentage_label_excl_unknown), vjust = .5, size = 6, hjust = -.12) +
geom_text(aes(label = paste0("(", round(overall_addition_contribution, 2), ")")), position = position_dodge(width = 0.9), vjust = .5, hjust = -1.1, size = 5)+# Adjust vjust and size as needed
labs( x = "",
y = "Percentage Contribution") +
theme_clean() +
ylim(0, 100)+
coord_flip()+
theme(axis.text = element_text(size = 14),
axis.title = element_text(size = 12))# Step 1: Sum the contribution fraction for each country per slug, per year
country_contribution_by_year <- user_commits_total %>%
group_by(slug, country_final, year_created) %>%
summarise(total_contribution_fraction = sum(contribution_fraction_loc, na.rm = T), .groups = 'drop')
# Step 2: Aggregate the contribution fraction for each country by year
country_aggregated_by_year <- country_contribution_by_year %>%
group_by(country_final, year_created) %>%
summarise(overall_contribution_fraction = sum(total_contribution_fraction), .groups = 'drop')
# Step 3: Exclude 'Unknown' and determine the top ten countries for each year
country_aggregated_by_year_excl_unknown <- country_aggregated_by_year %>%
filter(country_final != "Unknown")
# Step 4: Calculate the total overall contribution by year, excluding 'Unknown'
total_overall_contribution_by_year_excl_unknown <- country_aggregated_by_year_excl_unknown %>%
group_by(year_created) %>%
summarise(yearly_total_excl_unknown = sum(overall_contribution_fraction), .groups = 'drop')
# Now compute the percentage of contribution for each of the top countries, excluding 'Unknown'
country_aggregated_by_year_excl_unknown <- country_aggregated_by_year_excl_unknown %>%
left_join(total_overall_contribution_by_year_excl_unknown, by = "year_created") %>%
mutate(percentage_contribution_excl_unknown = (overall_contribution_fraction / yearly_total_excl_unknown) * 100) %>%
arrange(year_created, desc(percentage_contribution_excl_unknown))
# Step 5: Get the top ten countries by year, excluding 'Unknown'
top_countries_by_year_excl_unknown <- country_aggregated_by_year_excl_unknown %>%
group_by(year_created) %>%
top_n(10, wt = percentage_contribution_excl_unknown) %>%
ungroup()
# Filter out the 'Unknown' sector for plotting
plot_data2 <- top_countries_by_year_excl_unknown %>%
filter(year_created != "NA" & year_created != "2023")
plot_data3 <- top_countries_by_year_excl_unknown %>%
filter(year_created != "NA" & year_created != "2023",
country_final %in% c("United States", "Germany", "United Kingdom", "France", "Canada", "Australia", "Netherlands", "Switzerland", "Spain", "China"))The following graph shows all countries fractional credit over time
my_colors <- c("#FF0000", "#00FF00", "#0000FF", "#FFFF00", "#FF00FF", "#00FFFF", "#000000",
"#800000", "#008000", "#000080", "#808000", "#800080", "#008080", "#808080",
"#C00000", "#00C000", "#0000C0", "#C0C000", "#C000C0", "#00C0C0",
"#400000", "#004000", "#000040", "#404000", "#400040") # Define more colors as needed
# Stacked Bar Chart for Yearly Totals
ggplot(plot_data2, aes(x = year_created, y = overall_contribution_fraction, fill = country_final)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = my_colors) +
labs(x = "", y = "Number of Packages", title = "Weighted Country Contribution by Year", fill = "Country") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "bottom")This subsets to the top 10 that we identified previously
my_colors <- c("#6B8E23", "#8FBC8F", "#2E8B57", "#4682B4", "#87CEEB",
"#4169E1", "#B0C4DE", "#D2691E", "#CD853F", "#F4A460")
# Stacked Bar Chart for Yearly Totals
R_Country_time <- ggplot(plot_data3, aes(x = year_created, y = overall_contribution_fraction, fill = country_final)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = my_colors) +
labs(x = "", y = "Fractional Count of Packages", title = "Top Countries by Fractional Count of Packages", fill = "Country") + # Top Countries by Fractional Count of Packages, y-axis: Fractional Count of Packages
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "bottom")
R_Country_time## ggsave(filename = "\\\\westat.com\\dfs\\DVSTAT\\Individual Directories\\Askew\\Paper_Data\\New Graphs\\R_Country_time.png", plot = R_Country_time, width = 8, height = 6, dpi = 300)We also have the organization variable for some users. It works with the sector variable, so if we were not able to identify a sector, we also were not able to identify an organization.
# Replace NA values in 'organization' with 'Unknown'
user_commits_total$organization[is.na(user_commits_total$organization)] <- "Unknown"
user_commits_total$organization[user_commits_total$organization == "NA"] <- "Unknown"
cran_users_unique$organization[cran_users_unique$organization == "NA"] <- "Unknown"If we look at the top 10 most frequent organizations for unique R developers on Github, Google has the most with 86 followed by NetEase with 57. Only one in the top 10 is from a sector other than business or academic (Broad Institute - nonprofit)
cran_users_unique <- cran_users_unique %>%
filter(organization != "Unknown")
### sorting to the top 10 most common institutions for distinct GitHub users
top10_Institutions_GitHub_users_unique <- sort(table(cran_users_unique$organization), decreasing = T)
top10_Institutions_GitHub_users_unique <- as.data.frame(head(top10_Institutions_GitHub_users_unique, 10))
colnames(top10_Institutions_GitHub_users_unique) <- c("organization", "Freq")
### joining to institution unique dataframe to get sector variable
top10_Institutions_GitHub_users_unique <- cran_users_unique %>%
right_join(top10_Institutions_GitHub_users_unique, by = "organization")%>%
distinct(organization, .keep_all = T)%>%
select(organization, sector, Freq)%>%
arrange(desc(Freq))
### Graph output of top 10 institutions for unique maintainers
ggplot(top10_Institutions_GitHub_users_unique, aes(x = reorder(organization, Freq), y = Freq, fill = sector))+
geom_bar(stat = "identity") +
coord_flip() +
scale_y_continuous(expand = c(0,0)) +
labs(x = "", y = "Number of GitHub Users",
title = "Top 10 Organizations for Unique R Users on GitHub" ) +
ylim(c(0, 200))+
scale_fill_westat(option = "BLUES", drop = FALSE)+
theme_clean()+
theme(
plot.title = element_text(size = 13))+
labs(caption = "*Those without org info are removed in this figure (82% of 14,328 R Developers)")### Table output of top 10 Institutions for packages
top10_Institutions_GitHub_users_unique %>%
kbl(caption = "Most Frequent Institutions for R Developers on GitHub", escape = F)%>%
kable_classic()%>%
kable_styling(font_size = 12, full_width = T)%>%
row_spec(0, bold = T, background = westat_blue(), color = "white")%>%
column_spec(1:2, border_right = T)%>%
scroll_box()| organization | sector | Freq |
|---|---|---|
| Business | 86 | |
| NetEase | Business | 57 |
| University of California-Berkeley | Academic | 46 |
| Broad Institute | Nonprofit | 45 |
| University of Michigan-Ann Arbor | Academic | 41 |
| University of Washington-Seattle Campus | Academic | 40 |
| Harvard University | Academic | 40 |
| Microsoft | Business | 40 |
| RStudio | Business | 33 |
| Smith College | Academic | 32 |
We now will look at equal contribution for organizations
# Sum the contribution fraction for each organization per slug.
org_contribution <- user_commits_total %>%
group_by(slug, organization) %>%
summarise(total_contribution_fraction = sum(contribution_fraction_equal))
# Aggregate the contribution fraction for organization across all slugs
org_aggregated <- org_contribution %>%
group_by(organization) %>%
summarise(overall_contribution_fraction = sum(total_contribution_fraction))
# Calculate the total overall contribution fraction over all organizations
total_overall_contribution = sum(org_aggregated$overall_contribution_fraction)
# Calculate the percentage contribution for each organization
org_aggregated = org_aggregated %>%
mutate(percentage_contribution = round((overall_contribution_fraction / total_overall_contribution) * 100, 1))
### Plot percentage contribution
org_aggregated$percentage_label <- scales::percent(org_aggregated$percentage_contribution / 100)Again, the equal percentage contribution to Unknown is 77.7% just like we saw in the sector contribution section. Of the percentage we do know (611 different organizations), Rstudio leads with 7.2% followed by UCLA with 3%
total_excluding_unknown <- sum(org_aggregated$overall_contribution_fraction[org_aggregated$organization != "Unknown"])
org_aggregated <- org_aggregated %>%
mutate(percentage_contribution_excl_unknown = ifelse(organization != "Unknown",
round((overall_contribution_fraction / total_excluding_unknown) * 100, 1), NA_real_))
org_aggregated$percentage_label_excl_unknown <- scales::percent(org_aggregated$percentage_contribution_excl_unknown / 100, accuracy = 0.1)
top_10_orgs <- org_aggregated %>%
arrange(desc(percentage_contribution_excl_unknown)) %>%
head(10)
ggplot(top_10_orgs, aes(x = reorder(organization, percentage_contribution_excl_unknown), y = percentage_contribution_excl_unknown)) +
geom_bar(stat = "identity", fill = westat_blue()) +
geom_text(aes(label = percentage_label_excl_unknown), vjust = .5, size = 4, hjust = -.25) +
geom_text(aes(label = paste0("(", round(overall_contribution_fraction, 2), ")")), position = position_dodge(width = 0.9), vjust = .25, hjust = -1)+# Adjust vjust and size as needed
labs(title = "Percentage Contribution by Organization (Equal - Top 10 Organizations)",
x = "Organization",
y = "Percentage Contribution") +
theme_clean() +
ylim(0,100)+
coord_flip()+
theme(plot.title = element_text(size = 7))+
labs(caption = "*Excludes the percentage contribution from unknown organizations (77.7%)")Now, we base the contribution on additions for organization just as we did for sector and country.
org_addition_contribution <- user_commits_total %>%
group_by(slug, organization) %>%
summarise(total_addition_contribution = sum(contribution_fraction_loc))
org_aggregated_additions <- org_addition_contribution %>%
group_by(organization) %>%
summarise(overall_addition_contribution = sum(total_addition_contribution, na.rm = TRUE))
total_overall_additions = sum(org_aggregated_additions$overall_addition_contribution)
org_aggregated_additions$percentage_additions = round((org_aggregated_additions$overall_addition_contribution / total_overall_additions) * 100,1)
org_aggregated_additions$percentage_label_additions = scales::percent(org_aggregated_additions$percentage_additions / 100) Based on additions, the percentage contribution towards unknown is 75.6% just as we saw for sector, which is what we expect because the two variables coincide with one another. The percentage coming from Rstudio decreases to 5.9% (still number one), and the top 10 along with the order of the top 10 changes slightly. Notably, Monash University moves from the 10th position to the 4th position when factoring in additions.
total_excluding_unknown <- sum(org_aggregated_additions$overall_addition_contribution[org_aggregated_additions$organization != "Unknown"])
org_aggregated_additions <- org_aggregated_additions %>%
mutate(percentage_contribution_excl_unknown = ifelse(organization != "Unknown",
round((overall_addition_contribution / total_excluding_unknown) * 100, 1), NA_real_))
org_aggregated_additions$percentage_label_excl_unknown <- scales::percent(org_aggregated_additions$percentage_contribution_excl_unknown / 100, accuracy = 0.1)
top_10_orgs_additions <- org_aggregated_additions %>%
arrange(desc(percentage_contribution_excl_unknown)) %>%
head(10)
ggplot(top_10_orgs_additions, aes(x = reorder(organization, percentage_contribution_excl_unknown), y = percentage_contribution_excl_unknown)) +
geom_bar(stat = "identity", fill = westat_blue()) +
geom_text(aes(label = percentage_label_excl_unknown), vjust = .5, size = 4, hjust = -.25) +
geom_text(aes(label = paste0("(", round(overall_addition_contribution, 2), ")")), position = position_dodge(width = 0.9), vjust = .25, hjust = -1)+# Adjust vjust and size as needed
labs(title = "Percentage Contribution by Organization (Weighted - Top 10 Organizations)",
x = "Organization",
y = "Percentage Contribution") +
theme_clean() +
ylim(0,100)+
coord_flip()+
theme(plot.title = element_text(size = 7))+
labs(caption = "*Excludes the percentage contribution from unknown (75.6%)")We create edgelists for countries and sectors in the following section
What are the overall structural features of the OSS networks? How do they differ across fields, sectors, institutions, and countries? Units of analysis (OSS actors): projects, categories, developers, institutions, sectors, countries
What are the different communities that can be identified using structural features of the networks? Do they correspond to similarities in languages, methods, location, culture?
### select dependency information for slugs and packages
cran_github_rdi <- cran_github %>%
select(Package, slug, Depends)
### rename columns
colnames(cran_github_rdi) <- c("Citing_Package", "slug", "Dependencies")
### Package citation column will be the unlisted dependencies column
cran_github_rdi$Package_Citation <- cran_github_rdi$Dependencies
### join commits information for the citing packages
cran_github_RDI <- cran_github_rdi %>%
inner_join(user_commits_total, by = "slug")%>%
select(Citing_Package, slug, Dependencies, login,
country_final, total_additions, total_code_for_slug,
contribution_fraction_loc, Package_Citation) %>%
# Remove rows with NA in Depends
filter(!is.na(Package_Citation))
### rename columns on the basis of the citing package
colnames(cran_github_RDI) <- c("Citing_Package", "Citing_Slug", "Dependencies", "Citing_Login", "Citing_Country",
"Citing_Additions", "Citing_Total_Slug_Additions", "Citing_Package_Fraction" , "Package_Citation")
### unlist the dependencies for joining
cran_github_RDI_network <- cran_github_RDI %>%
separate_rows(Package_Citation, sep = ",\\s*") %>%
filter(Package_Citation != "")
#### prepare commits information for cited packages
user_commits_rdi <- user_commits_total %>%
mutate(Package_Citation = str_split(slug, "/", simplify = TRUE)[, 2])%>%
select(login, country_final, total_additions, total_code_for_slug, contribution_fraction_loc, Package_Citation)
colnames(user_commits_rdi) <- c( "Cited_Login", "Cited_Country",
"Cited_Additions", "Cited_Total_Slug_Additions", "Cited_Package_Fraction", "Package_Citation")
### join cited package commit information to citing package dataframe
cran_github_rdi_full <- cran_github_RDI_network %>%
inner_join(user_commits_rdi, by = "Package_Citation")
### create dependency_fraction = citing package fraction multiplied by cited package fraction
cran_github_rdi_grouped <- cran_github_rdi_full %>%
mutate(Dependency_Fraction = Citing_Package_Fraction * Cited_Package_Fraction)# Group by Cited Country and Citing Country, and sum Dependency_Fraction
### the number of citations made from one country to another is simply the sum of the fractioned scores associated with each pair, with the sum across all possible pairs adding up to the total number of citations made at the world level.
dependency_summary <- cran_github_rdi_grouped %>%
group_by(Cited_Country, Citing_Country) %>%
summarize(Total_Dependency_Fraction = sum(Dependency_Fraction, na.rm = TRUE))
sum(dependency_summary$Total_Dependency_Fraction)[1] 589
# Group by Cited Country and sum Total_Dependency_Fraction - total number of citations attributed to each country
citations_by_country <- dependency_summary %>%
group_by(Cited_Country) %>%
summarize(Fraction_of_Citations = round(sum(Total_Dependency_Fraction, na.rm = TRUE), 4))
sum(citations_by_country$Fraction_of_Citations)[1] 589.0001
citations_by_country$Denominator_RDI <- round(citations_by_country$Fraction_of_Citations / sum(citations_by_country$Fraction_of_Citations),4)
# Group by citing country and sum Total_Dependency_Fraction - total number of citations made by each country
citings_by_country <- dependency_summary %>%
group_by(Citing_Country) %>%
summarize(Fraction_of_Citings = round(sum(Total_Dependency_Fraction, na.rm = TRUE), 4))
sum(citings_by_country$Fraction_of_Citings)[1] 588.9999
# join citings by country with dependency_summary
citings_dependency_summary <- citings_by_country %>%
full_join(dependency_summary, by = "Citing_Country")
citings_dependency_summary$Numerator_RDI <- round(citings_dependency_summary$Total_Dependency_Fraction / citings_dependency_summary$Fraction_of_Citings,4)
## join denominator_RDI
citations_citings_dependency_summary <- citations_by_country %>%
full_join(citings_dependency_summary, by = "Cited_Country") %>%
select(Citing_Country, Cited_Country, Numerator_RDI, Denominator_RDI)
citations_citings_dependency_summary$RDI <- round(citations_citings_dependency_summary$Numerator_RDI / citations_citings_dependency_summary$Denominator_RDI,4)dependency_summary %>%
arrange(desc(Total_Dependency_Fraction))%>%
kbl(caption = "Country Pair Dependency Weights", escape = F)%>%
kable_classic()%>%
kable_styling(font_size = 12, full_width = T)%>%
row_spec(0, bold = T, background = westat_blue(), color = "white")%>%
column_spec(1:2, border_right = T)%>%
scroll_box(width = "100%", height = "500px")| Cited_Country | Citing_Country | Total_Dependency_Fraction |
|---|---|---|
| Unknown | Unknown | 87.3146359 |
| United States | Unknown | 64.3139658 |
| United States | United States | 35.7966513 |
| Unknown | United States | 26.3714621 |
| France | Unknown | 14.4800845 |
| Unknown | Germany | 12.1686281 |
| Unknown | United Kingdom | 11.7170320 |
| Germany | Unknown | 10.3942012 |
| Norway | Norway | 8.2998583 |
| United States | Germany | 7.9622908 |
| Germany | Germany | 7.6496787 |
| Unknown | Netherlands | 6.8692472 |
| United States | Spain | 6.8043969 |
| Denmark | Unknown | 6.7001129 |
| Unknown | Canada | 6.3906341 |
| Canada | Unknown | 6.3692538 |
| United States | Italy | 6.1850725 |
| United States | Australia | 6.1608042 |
| Unknown | France | 5.9665531 |
| Unknown | Australia | 5.7756594 |
| Bulgaria | Unknown | 5.2967973 |
| United States | United Kingdom | 5.1454252 |
| United States | France | 4.8013768 |
| Germany | United States | 4.7021219 |
| Unknown | New Zealand | 4.4227168 |
| Unknown | Belgium | 4.1871672 |
| United Kingdom | Unknown | 4.1108012 |
| Netherlands | Unknown | 3.6034512 |
| Austria | United States | 3.5561262 |
| Unknown | Italy | 3.3489010 |
| Denmark | Denmark | 3.1556103 |
| France | United States | 3.1419765 |
| Unknown | Poland | 3.0670811 |
| Germany | Canada | 2.8483983 |
| Denmark | United States | 2.6590604 |
| Australia | Unknown | 2.6535606 |
| Unknown | Brazil | 2.6226700 |
| United States | Netherlands | 2.6049659 |
| Australia | Australia | 2.5753766 |
| United States | Canada | 2.5707256 |
| Canada | Canada | 2.5509336 |
| Germany | Australia | 2.5390527 |
| Norway | Unknown | 2.4723814 |
| Denmark | Germany | 2.3466958 |
| France | France | 2.1811472 |
| United States | Switzerland | 2.1505801 |
| United States | South Korea | 2.0812051 |
| Unknown | Ireland | 2.0721542 |
| Unknown | Denmark | 2.0536379 |
| United States | New Zealand | 2.0487277 |
| United States | Ireland | 2.0308880 |
| United Kingdom | Netherlands | 2.0298811 |
| United States | Brazil | 2.0081528 |
| Italy | Italy | 2.0003613 |
| Hong Kong | Hong Kong | 2.0000000 |
| Romania | Romania | 2.0000000 |
| Unknown | Sweden | 1.9932171 |
| Australia | United States | 1.9867464 |
| Israel | Unknown | 1.9366085 |
| United States | Denmark | 1.9037860 |
| Canada | United States | 1.8759120 |
| Germany | Spain | 1.8035616 |
| Germany | United Kingdom | 1.7853307 |
| Colombia | Unknown | 1.7366579 |
| Netherlands | Netherlands | 1.7265773 |
| United States | Peru | 1.6861346 |
| Netherlands | Germany | 1.6542048 |
| United Kingdom | United States | 1.6175734 |
| Norway | United States | 1.5929450 |
| Netherlands | United States | 1.5654457 |
| Germany | Netherlands | 1.5589904 |
| Germany | Israel | 1.5411524 |
| Spain | Ecuador | 1.4848368 |
| United States | Belgium | 1.4739769 |
| United States | China | 1.4634399 |
| United Kingdom | United Kingdom | 1.4321661 |
| Denmark | United Kingdom | 1.4135857 |
| Unknown | Colombia | 1.3794315 |
| United States | Mauritius | 1.3733200 |
| Canada | United Kingdom | 1.3526687 |
| United States | Romania | 1.3411052 |
| Unknown | Spain | 1.3233865 |
| Germany | Norway | 1.3159755 |
| France | Netherlands | 1.3057481 |
| United States | Austria | 1.2618838 |
| Canada | New Zealand | 1.2552483 |
| France | Spain | 1.2341293 |
| Bulgaria | Denmark | 1.2013438 |
| United States | Czech Republic | 1.1903569 |
| Unknown | Peru | 1.1818778 |
| Spain | Unknown | 1.1622256 |
| Japan | Unknown | 1.1621650 |
| Spain | Italy | 1.1597043 |
| Denmark | Netherlands | 1.1553843 |
| United States | Taiwan | 1.1521244 |
| Sweden | United States | 1.1132434 |
| Switzerland | United Kingdom | 1.0602851 |
| France | United Kingdom | 1.0338293 |
| Unknown | Switzerland | 1.0211730 |
| China | Unknown | 1.0137011 |
| Brazil | Unknown | 1.0059937 |
| New Zealand | France | 1.0010112 |
| Denmark | Singapore | 1.0000000 |
| Russia | Russia | 1.0000000 |
| Israel | France | 0.9998031 |
| Spain | Norway | 0.9992790 |
| Unknown | Fiji | 0.9977071 |
| Israel | United States | 0.9972796 |
| Netherlands | New Caledonia | 0.9862442 |
| Austria | Poland | 0.9847822 |
| Denmark | Poland | 0.9838720 |
| Italy | Australia | 0.9748095 |
| Switzerland | Unknown | 0.9725535 |
| Unknown | Austria | 0.9647271 |
| France | New Caledonia | 0.9581810 |
| France | Hong Kong | 0.9545580 |
| Finland | Unknown | 0.9527975 |
| Poland | France | 0.9423647 |
| Unknown | Portugal | 0.9380079 |
| United States | Nigeria | 0.9345863 |
| Unknown | Czech Republic | 0.9224215 |
| Netherlands | China | 0.9220582 |
| Germany | Colombia | 0.8995402 |
| Germany | Italy | 0.8971911 |
| France | Germany | 0.8920140 |
| Belgium | Canada | 0.8573285 |
| Canada | Sweden | 0.8527621 |
| United States | Israel | 0.8506106 |
| Italy | Ecuador | 0.8386308 |
| Canada | Australia | 0.7810893 |
| Bulgaria | United States | 0.7757486 |
| Germany | Czech Republic | 0.7691053 |
| Denmark | Australia | 0.7651278 |
| France | Canada | 0.7521236 |
| Greece | Unknown | 0.7242482 |
| Unknown | Norway | 0.7230468 |
| Lithuania | Switzerland | 0.7128352 |
| Australia | Russia | 0.7117685 |
| United States | Lithuania | 0.6849539 |
| Germany | South Korea | 0.6785448 |
| United States | Poland | 0.6559873 |
| Germany | France | 0.6452659 |
| Sweden | Unknown | 0.6360338 |
| Unknown | Singapore | 0.6316281 |
| Bulgaria | Poland | 0.6177956 |
| Germany | Chile | 0.6159934 |
| Bulgaria | Austria | 0.6148564 |
| Australia | Spain | 0.6140359 |
| Norway | Spain | 0.5967587 |
| Norway | Canada | 0.5853739 |
| Unknown | South Korea | 0.5825402 |
| Bulgaria | France | 0.5774379 |
| France | Belgium | 0.5595327 |
| Canada | Germany | 0.5285778 |
| Unknown | Curaçao | 0.5121700 |
| Austria | Unknown | 0.5114360 |
| United States | Sweden | 0.5072172 |
| Netherlands | Peru | 0.5069443 |
| France | Ireland | 0.4821894 |
| Germany | Switzerland | 0.4633026 |
| Finland | Germany | 0.4559689 |
| Finland | Norway | 0.4482860 |
| United States | Colombia | 0.4478594 |
| United States | Norway | 0.4472529 |
| United States | New Caledonia | 0.4326497 |
| United States | Uruguay | 0.4324207 |
| Australia | Denmark | 0.4310748 |
| United States | Panama | 0.4309000 |
| Japan | United States | 0.4096919 |
| Unknown | China | 0.4071286 |
| France | Colombia | 0.4012124 |
| Norway | Australia | 0.3983430 |
| Colombia | Germany | 0.3970159 |
| France | Switzerland | 0.3871114 |
| France | Peru | 0.3841060 |
| Bulgaria | Canada | 0.3784731 |
| Switzerland | United States | 0.3773445 |
| Canada | Netherlands | 0.3664586 |
| Unknown | Saudi Arabia | 0.3570105 |
| Bulgaria | Netherlands | 0.3368614 |
| Greece | Netherlands | 0.3339926 |
| Denmark | New Zealand | 0.3323449 |
| Bulgaria | United Kingdom | 0.3290682 |
| Norway | South Korea | 0.3219288 |
| Spain | United States | 0.3163808 |
| France | Romania | 0.3092703 |
| Netherlands | Ireland | 0.3083117 |
| Sweden | Denmark | 0.3068182 |
| Australia | Israel | 0.2985812 |
| Netherlands | Lithuania | 0.2962873 |
| Canada | Italy | 0.2948238 |
| Switzerland | Italy | 0.2919480 |
| Unknown | New Caledonia | 0.2887299 |
| Canada | Colombia | 0.2822796 |
| Unknown | Uruguay | 0.2717018 |
| Unknown | Panama | 0.2707463 |
| Netherlands | Switzerland | 0.2706131 |
| Unknown | Taiwan | 0.2691071 |
| France | Mauritius | 0.2640198 |
| Switzerland | Netherlands | 0.2574750 |
| Bulgaria | China | 0.2567134 |
| Bulgaria | Taiwan | 0.2526643 |
| Sweden | Sweden | 0.2498985 |
| Denmark | Canada | 0.2355911 |
| Unknown | Mauritius | 0.2304751 |
| Unknown | Romania | 0.2230253 |
| Denmark | Italy | 0.2197927 |
| Colombia | Spain | 0.2175700 |
| Denmark | Austria | 0.2122025 |
| France | Brazil | 0.2111520 |
| Unknown | Greece | 0.2090994 |
| Denmark | Norway | 0.2075330 |
| Bulgaria | Czech Republic | 0.2022632 |
| Denmark | China | 0.2005919 |
| Australia | New Caledonia | 0.1962969 |
| Canada | Czech Republic | 0.1952466 |
| Japan | Australia | 0.1880885 |
| Denmark | Peru | 0.1878818 |
| Switzerland | Switzerland | 0.1844749 |
| Poland | Unknown | 0.1843025 |
| Canada | Norway | 0.1813057 |
| Denmark | Belgium | 0.1758419 |
| Denmark | Brazil | 0.1752140 |
| United Kingdom | Italy | 0.1630648 |
| Canada | Austria | 0.1621032 |
| Netherlands | Italy | 0.1619164 |
| Canada | China | 0.1577323 |
| Australia | France | 0.1505368 |
| France | Sweden | 0.1497168 |
| Belgium | Unknown | 0.1493700 |
| Denmark | Ireland | 0.1483553 |
| France | Australia | 0.1465289 |
| Canada | Peru | 0.1457408 |
| Japan | Germany | 0.1442236 |
| Japan | United Kingdom | 0.1423299 |
| Canada | Denmark | 0.1404967 |
| Denmark | New Caledonia | 0.1374681 |
| Denmark | Uruguay | 0.1372391 |
| Denmark | Panama | 0.1367564 |
| Denmark | South Korea | 0.1365204 |
| Unknown | Ecuador | 0.1360882 |
| Denmark | Taiwan | 0.1352712 |
| Canada | Brazil | 0.1349812 |
| Norway | United Kingdom | 0.1315918 |
| United States | Greece | 0.1260161 |
| Colombia | Greece | 0.1258970 |
| France | Italy | 0.1192478 |
| Australia | New Zealand | 0.1169271 |
| Colombia | United States | 0.1139295 |
| Canada | Ireland | 0.1109200 |
| Sweden | Germany | 0.1107721 |
| Colombia | United Kingdom | 0.1096227 |
| Netherlands | Australia | 0.1086190 |
| Canada | Taiwan | 0.1079066 |
| Canada | Poland | 0.1071618 |
| Canada | South Korea | 0.1052102 |
| Norway | Germany | 0.1049158 |
| Canada | New Caledonia | 0.1047443 |
| Canada | Uruguay | 0.1047443 |
| Canada | Panama | 0.1043759 |
| Switzerland | Austria | 0.1006642 |
| Switzerland | Norway | 0.1006271 |
| Bulgaria | Germany | 0.0961149 |
| China | United Kingdom | 0.0934190 |
| China | Australia | 0.0928251 |
| China | Romania | 0.0926876 |
| Australia | Netherlands | 0.0900973 |
| United States | Iran | 0.0895339 |
| Colombia | Switzerland | 0.0866403 |
| Lithuania | Unknown | 0.0850502 |
| Switzerland | Canada | 0.0831040 |
| Norway | Italy | 0.0777531 |
| Norway | Czech Republic | 0.0776187 |
| Norway | Israel | 0.0775624 |
| Switzerland | Germany | 0.0753103 |
| Denmark | Switzerland | 0.0744288 |
| Switzerland | Israel | 0.0671474 |
| Japan | Netherlands | 0.0670224 |
| Bulgaria | Italy | 0.0664073 |
| Unknown | Israel | 0.0642599 |
| Netherlands | United Kingdom | 0.0642527 |
| Canada | Belgium | 0.0635688 |
| Norway | Chile | 0.0623432 |
| United States | Portugal | 0.0618976 |
| Belgium | United States | 0.0613799 |
| Switzerland | Australia | 0.0612307 |
| Switzerland | Czech Republic | 0.0606923 |
| Canada | Switzerland | 0.0589714 |
| Germany | New Zealand | 0.0584768 |
| Belgium | Australia | 0.0569835 |
| Switzerland | Saudi Arabia | 0.0568921 |
| Sweden | Greece | 0.0561070 |
| Norway | France | 0.0558811 |
| France | Denmark | 0.0556274 |
| France | Iran | 0.0512067 |
| Unknown | Russia | 0.0505592 |
| Japan | Austria | 0.0505173 |
| Denmark | Czech Republic | 0.0503192 |
| Australia | Italy | 0.0489262 |
| Switzerland | Chile | 0.0485827 |
| Japan | China | 0.0476934 |
| New Zealand | Unknown | 0.0469499 |
| Japan | Peru | 0.0462711 |
| Japan | Canada | 0.0448302 |
| Switzerland | France | 0.0445385 |
| Canada | France | 0.0441761 |
| Colombia | Brazil | 0.0440714 |
| Japan | Denmark | 0.0439129 |
| Japan | Brazil | 0.0425713 |
| United States | Russia | 0.0414637 |
| United Kingdom | France | 0.0409140 |
| Japan | New Zealand | 0.0388648 |
| Italy | Unknown | 0.0388300 |
| Unknown | Iran | 0.0377344 |
| France | New Zealand | 0.0372241 |
| Japan | Italy | 0.0370687 |
| Colombia | Netherlands | 0.0366625 |
| Germany | Sweden | 0.0362615 |
| Canada | Israel | 0.0358376 |
| Bulgaria | Spain | 0.0342123 |
| Japan | Norway | 0.0338562 |
| Japan | South Korea | 0.0333496 |
| Japan | Poland | 0.0333440 |
| Japan | Uruguay | 0.0326517 |
| Japan | New Caledonia | 0.0326517 |
| Japan | Panama | 0.0325369 |
| Japan | Taiwan | 0.0321835 |
| Germany | Belgium | 0.0319093 |
| France | Argentina | 0.0308185 |
| Poland | Denmark | 0.0294982 |
| United Kingdom | Singapore | 0.0294045 |
| Bulgaria | Nigeria | 0.0289694 |
| United Kingdom | New Caledonia | 0.0286144 |
| Unknown | Croatia | 0.0280926 |
| Canada | Chile | 0.0280021 |
| Australia | Norway | 0.0262178 |
| Spain | France | 0.0262091 |
| Poland | United States | 0.0252270 |
| United Kingdom | Hong Kong | 0.0249606 |
| Spain | Spain | 0.0243645 |
| United Kingdom | Canada | 0.0240402 |
| Australia | Germany | 0.0237971 |
| Japan | Ireland | 0.0237783 |
| Netherlands | Brazil | 0.0237204 |
| Germany | Peru | 0.0223014 |
| Netherlands | Austria | 0.0221971 |
| Australia | United Kingdom | 0.0214205 |
| Japan | Belgium | 0.0208931 |
| Bulgaria | Belgium | 0.0202719 |
| Netherlands | Canada | 0.0199530 |
| New Zealand | United States | 0.0198287 |
| Netherlands | South Korea | 0.0196425 |
| Unknown | Vietnam | 0.0193056 |
| Netherlands | Denmark | 0.0192819 |
| United Kingdom | Australia | 0.0189649 |
| United Kingdom | Germany | 0.0182602 |
| Austria | Australia | 0.0179254 |
| Poland | Israel | 0.0177945 |
| France | South Korea | 0.0175913 |
| Germany | Romania | 0.0167968 |
| United States | Curaçao | 0.0167415 |
| United Kingdom | Norway | 0.0167285 |
| Unknown | Hong Kong | 0.0165701 |
| Finland | United States | 0.0165472 |
| Netherlands | Norway | 0.0165328 |
| Japan | Switzerland | 0.0164776 |
| Bulgaria | Ireland | 0.0164602 |
| Belgium | Germany | 0.0161245 |
| Bulgaria | Peru | 0.0159461 |
| Taiwan | Unknown | 0.0158341 |
| South Korea | Sweden | 0.0158018 |
| Australia | Czech Republic | 0.0157652 |
| New Zealand | Australia | 0.0152377 |
| Poland | Poland | 0.0151597 |
| Poland | Austria | 0.0151359 |
| Austria | United Kingdom | 0.0151351 |
| Italy | Spain | 0.0148336 |
| Germany | Mauritius | 0.0146504 |
| Netherlands | Poland | 0.0146380 |
| Italy | Colombia | 0.0143472 |
| Netherlands | New Zealand | 0.0143121 |
| Netherlands | Uruguay | 0.0143048 |
| Netherlands | Panama | 0.0142545 |
| Netherlands | Taiwan | 0.0142112 |
| United Kingdom | Ireland | 0.0142092 |
| Netherlands | Belgium | 0.0139892 |
| United States | South Africa | 0.0134590 |
| Germany | Saudi Arabia | 0.0133996 |
| Poland | Netherlands | 0.0129269 |
| Germany | Poland | 0.0128339 |
| Taiwan | United States | 0.0125941 |
| Spain | Poland | 0.0125781 |
| Australia | Chile | 0.0125428 |
| Taiwan | Australia | 0.0119497 |
| United States | Hungary | 0.0119127 |
| Greece | United States | 0.0117563 |
| Denmark | France | 0.0117198 |
| Colombia | Australia | 0.0115513 |
| Japan | Czech Republic | 0.0114539 |
| Chile | Unknown | 0.0114306 |
| Greece | Germany | 0.0113962 |
| Poland | Czech Republic | 0.0112703 |
| Unknown | Chile | 0.0110268 |
| United Kingdom | Colombia | 0.0104980 |
| Bulgaria | Australia | 0.0104409 |
| United Kingdom | Czech Republic | 0.0102863 |
| United Kingdom | Israel | 0.0102854 |
| Bulgaria | New Zealand | 0.0102791 |
| Poland | Canada | 0.0101044 |
| Colombia | New Zealand | 0.0100556 |
| Spain | United Kingdom | 0.0096010 |
| Australia | Nigeria | 0.0092871 |
| United States | Saudi Arabia | 0.0092137 |
| Colombia | Denmark | 0.0090860 |
| Bulgaria | Lithuania | 0.0090474 |
| Germany | Brazil | 0.0087518 |
| United Kingdom | Spain | 0.0082342 |
| Poland | United Kingdom | 0.0081089 |
| Spain | Brazil | 0.0080278 |
| Bulgaria | Switzerland | 0.0079930 |
| Unknown | Nigeria | 0.0079273 |
| United Kingdom | Chile | 0.0078933 |
| Bulgaria | Romania | 0.0073016 |
| Bulgaria | Mauritius | 0.0069345 |
| Unknown | South Africa | 0.0069098 |
| Unknown | Finland | 0.0067922 |
| Spain | South Korea | 0.0066939 |
| France | Poland | 0.0066835 |
| Finland | Belgium | 0.0065901 |
| Poland | Italy | 0.0064869 |
| Norway | Saudi Arabia | 0.0063399 |
| Ukraine | Unknown | 0.0063074 |
| Singapore | Australia | 0.0060268 |
| France | Austria | 0.0058384 |
| France | Norway | 0.0056087 |
| Denmark | Nigeria | 0.0055300 |
| United States | Chile | 0.0055177 |
| Denmark | Israel | 0.0055002 |
| Germany | Denmark | 0.0054771 |
| Spain | Germany | 0.0053937 |
| Austria | France | 0.0052599 |
| New Caledonia | Unknown | 0.0052364 |
| Netherlands | Czech Republic | 0.0052323 |
| Italy | United States | 0.0051895 |
| Denmark | Spain | 0.0051378 |
| France | China | 0.0051080 |
| France | Israel | 0.0048205 |
| Portugal | Unknown | 0.0046360 |
| Spain | Canada | 0.0045777 |
| Finland | Netherlands | 0.0045139 |
| Finland | Peru | 0.0045076 |
| Denmark | Chile | 0.0042128 |
| Colombia | France | 0.0041863 |
| Colombia | Sweden | 0.0040938 |
| Singapore | Unknown | 0.0040324 |
| Chile | United States | 0.0040093 |
| Ukraine | Italy | 0.0039478 |
| Germany | Ecuador | 0.0038602 |
| Switzerland | Ecuador | 0.0038602 |
| Austria | Netherlands | 0.0037632 |
| United Kingdom | Nigeria | 0.0036867 |
| Poland | Norway | 0.0036790 |
| Switzerland | Spain | 0.0036213 |
| Austria | New Caledonia | 0.0036085 |
| Austria | Hong Kong | 0.0036029 |
| Japan | France | 0.0035910 |
| Poland | Germany | 0.0035711 |
| Denmark | Sweden | 0.0035000 |
| Spain | Switzerland | 0.0034732 |
| France | Uruguay | 0.0033940 |
| Uganda | Unknown | 0.0033897 |
| France | Panama | 0.0033821 |
| France | Taiwan | 0.0033638 |
| United States | Croatia | 0.0033457 |
| Italy | Norway | 0.0033121 |
| United Kingdom | South Korea | 0.0032662 |
| Sweden | Australia | 0.0032072 |
| United Kingdom | Brazil | 0.0031792 |
| Brazil | United States | 0.0031430 |
| United Kingdom | New Zealand | 0.0031081 |
| Finland | Spain | 0.0031065 |
| Finland | Romania | 0.0031065 |
| Spain | Australia | 0.0030526 |
| United Arab Emirates | Unknown | 0.0030249 |
| Belgium | Brazil | 0.0030008 |
| Spain | New Zealand | 0.0029926 |
| Finland | United Kingdom | 0.0029745 |
| Unknown | Lithuania | 0.0029621 |
| Finland | Mauritius | 0.0029502 |
| Mexico | Unknown | 0.0029216 |
| Japan | Spain | 0.0028533 |
| Argentina | Unknown | 0.0028179 |
| Germany | Iran | 0.0028132 |
| Canada | Nigeria | 0.0027801 |
| Belgium | South Korea | 0.0027615 |
| Austria | Canada | 0.0027188 |
| Canada | Spain | 0.0026913 |
| Kenya | Unknown | 0.0026790 |
| New Zealand | Germany | 0.0026715 |
| France | Czech Republic | 0.0026421 |
| Netherlands | France | 0.0025869 |
| China | United States | 0.0025423 |
| United Kingdom | Denmark | 0.0025162 |
| United States | Singapore | 0.0025121 |
| Finland | France | 0.0024718 |
| United Kingdom | Sweden | 0.0024510 |
| Poland | Australia | 0.0024258 |
| New Zealand | Spain | 0.0024141 |
| China | Spain | 0.0023455 |
| Unknown | Japan | 0.0023344 |
| New Zealand | United Kingdom | 0.0023064 |
| Ukraine | United States | 0.0022998 |
| Spain | Ireland | 0.0022905 |
| Spain | Netherlands | 0.0022856 |
| Portugal | Belgium | 0.0022806 |
| Belgium | Peru | 0.0022471 |
| Colombia | Poland | 0.0021876 |
| Netherlands | Spain | 0.0021545 |
| Italy | Germany | 0.0020929 |
| New Zealand | Ireland | 0.0020238 |
| Canada | Curaçao | 0.0020178 |
| Portugal | United States | 0.0020133 |
| Netherlands | Romania | 0.0020044 |
| Switzerland | Belgium | 0.0019900 |
| Greenland | Unknown | 0.0019842 |
| Spain | Belgium | 0.0019662 |
| Antarctica | Unknown | 0.0018835 |
| Norway | Brazil | 0.0018735 |
| New Zealand | Lithuania | 0.0018420 |
| Netherlands | Mauritius | 0.0018407 |
| Finland | Denmark | 0.0018355 |
| Chile | Australia | 0.0018312 |
| South Africa | Unknown | 0.0018268 |
| Brazil | New Zealand | 0.0018180 |
| Austria | Ireland | 0.0017710 |
| Poland | Chile | 0.0017650 |
| Sweden | Netherlands | 0.0017490 |
| Japan | Nigeria | 0.0017124 |
| Italy | Israel | 0.0016718 |
| Sweden | United Kingdom | 0.0016704 |
| Denmark | Croatia | 0.0016311 |
| Antarctica | United States | 0.0016147 |
| Belgium | New Caledonia | 0.0016039 |
| India | Mauritius | 0.0015996 |
| Bulgaria | Brazil | 0.0015963 |
| Uruguay | Canada | 0.0015718 |
| Germany | Austria | 0.0015611 |
| Sweden | Spain | 0.0015609 |
| Spain | China | 0.0015585 |
| Sweden | France | 0.0015392 |
| Australia | Switzerland | 0.0015271 |
| United Kingdom | Austria | 0.0015167 |
| Austria | Colombia | 0.0015143 |
| Portugal | Netherlands | 0.0015132 |
| Spain | Taiwan | 0.0015128 |
| United Arab Emirates | United States | 0.0015084 |
| United Arab Emirates | Australia | 0.0014849 |
| Bulgaria | Sweden | 0.0014771 |
| Austria | Germany | 0.0014466 |
| Sweden | Ireland | 0.0014365 |
| Chile | United Kingdom | 0.0014351 |
| Belgium | Spain | 0.0014311 |
| Chile | Germany | 0.0014294 |
| Brazil | United Kingdom | 0.0014177 |
| Sweden | Saudi Arabia | 0.0014128 |
| Germany | Ireland | 0.0014026 |
| China | Germany | 0.0013954 |
| Finland | Australia | 0.0013888 |
| Spain | Peru | 0.0013796 |
| United Kingdom | Poland | 0.0013763 |
| Bulgaria | Iran | 0.0013449 |
| Bulgaria | South Korea | 0.0013220 |
| United Kingdom | Peru | 0.0013107 |
| China | Denmark | 0.0012872 |
| Finland | Canada | 0.0012629 |
| New Zealand | New Zealand | 0.0012489 |
| Australia | Peru | 0.0012300 |
| Colombia | Belgium | 0.0012251 |
| Austria | Belgium | 0.0012134 |
| Sweden | Switzerland | 0.0011991 |
| Japan | Romania | 0.0011935 |
| Uganda | United States | 0.0011899 |
| New Zealand | Italy | 0.0011746 |
| Singapore | Germany | 0.0011616 |
| Japan | Israel | 0.0011439 |
| Italy | Czech Republic | 0.0011382 |
| Germany | Lithuania | 0.0011377 |
| Japan | Mauritius | 0.0011335 |
| New Zealand | Norway | 0.0010773 |
| France | Chile | 0.0010677 |
| Brazil | Australia | 0.0010652 |
| China | Switzerland | 0.0010574 |
| China | Netherlands | 0.0010487 |
| Ukraine | Germany | 0.0010473 |
| Brazil | Germany | 0.0010455 |
| New Zealand | Switzerland | 0.0010339 |
| United Kingdom | Switzerland | 0.0010188 |
| Japan | Sweden | 0.0010038 |
| India | Unknown | 0.0009953 |
| Italy | France | 0.0009951 |
| Mexico | United States | 0.0009933 |
| Turkey | Unknown | 0.0009694 |
| Finland | Austria | 0.0009617 |
| Argentina | United States | 0.0009523 |
| Canada | South Africa | 0.0009517 |
| Brazil | France | 0.0009503 |
| Brazil | Ireland | 0.0009488 |
| Kenya | United States | 0.0009271 |
| Ukraine | Australia | 0.0009261 |
| Finland | Poland | 0.0009261 |
| Belgium | United Kingdom | 0.0009257 |
| New Zealand | Peru | 0.0009190 |
| China | France | 0.0009143 |
| Taiwan | Spain | 0.0009137 |
| Italy | Chile | 0.0009114 |
| India | United States | 0.0008787 |
| Italy | Brazil | 0.0008740 |
| New Zealand | Netherlands | 0.0008702 |
| Colombia | Peru | 0.0008680 |
| Norway | Netherlands | 0.0008580 |
| Norway | Peru | 0.0008514 |
| Germany | Mexico | 0.0008489 |
| United Kingdom | China | 0.0008407 |
| Colombia | Canada | 0.0008375 |
| Austria | Peru | 0.0008286 |
| Bulgaria | Norway | 0.0008100 |
| Bulgaria | New Caledonia | 0.0008099 |
| Bulgaria | Uruguay | 0.0008099 |
| Bulgaria | Panama | 0.0008070 |
| United Kingdom | Argentina | 0.0008059 |
| New Zealand | Brazil | 0.0007711 |
| United Kingdom | Russia | 0.0007700 |
| Germany | New Caledonia | 0.0007478 |
| Australia | Brazil | 0.0007444 |
| Ukraine | United Kingdom | 0.0007331 |
| New Zealand | Israel | 0.0007185 |
| Greenland | United States | 0.0006965 |
| Italy | South Korea | 0.0006902 |
| New Zealand | Canada | 0.0006897 |
| Czech Republic | Unknown | 0.0006842 |
| Chile | Netherlands | 0.0006695 |
| Canada | Romania | 0.0006688 |
| Colombia | Romania | 0.0006599 |
| Netherlands | Sweden | 0.0006440 |
| United Kingdom | Taiwan | 0.0006393 |
| Sweden | Russia | 0.0006311 |
| Switzerland | Nigeria | 0.0006245 |
| China | Poland | 0.0006180 |
| Switzerland | Brazil | 0.0006172 |
| China | Austria | 0.0006165 |
| Italy | Switzerland | 0.0006101 |
| France | Curaçao | 0.0006048 |
| Austria | Spain | 0.0006045 |
| Austria | Romania | 0.0006045 |
| China | Canada | 0.0005998 |
| China | Italy | 0.0005960 |
| New Zealand | Czech Republic | 0.0005925 |
| United Kingdom | Belgium | 0.0005866 |
| Japan | Chile | 0.0005787 |
| Austria | Mauritius | 0.0005741 |
| Finland | Iran | 0.0005722 |
| Antarctica | France | 0.0005699 |
| Israel | Sweden | 0.0005644 |
| New Caledonia | United States | 0.0005570 |
| France | Saudi Arabia | 0.0005523 |
| Uganda | Australia | 0.0005465 |
| Unknown | Argentina | 0.0005420 |
| Mexico | Australia | 0.0005396 |
| Netherlands | Israel | 0.0005282 |
| Uzbekistan | Unknown | 0.0005236 |
| Italy | United Kingdom | 0.0005235 |
| Antarctica | Germany | 0.0005186 |
| Antarctica | Ireland | 0.0005173 |
| Chile | Austria | 0.0005113 |
| Spain | Romania | 0.0005106 |
| Antarctica | Australia | 0.0005023 |
| Belgium | Switzerland | 0.0004956 |
| United States | Vietnam | 0.0004873 |
| Spain | Mauritius | 0.0004849 |
| Chile | China | 0.0004830 |
| China | Brazil | 0.0004818 |
| Australia | Canada | 0.0004706 |
| Chile | Peru | 0.0004558 |
| Chile | Canada | 0.0004524 |
| Finland | Sweden | 0.0004471 |
| Chile | Denmark | 0.0004428 |
| Austria | Sweden | 0.0004346 |
| New Zealand | Denmark | 0.0004333 |
| Argentina | Australia | 0.0004301 |
| Australia | Austria | 0.0004301 |
| Netherlands | South Africa | 0.0004290 |
| Uganda | United Kingdom | 0.0004276 |
| Kenya | Australia | 0.0004257 |
| Brazil | Switzerland | 0.0004257 |
| Australia | Singapore | 0.0004242 |
| Germany | China | 0.0004241 |
| Netherlands | Chile | 0.0004224 |
| Chile | Brazil | 0.0004217 |
| Uganda | Germany | 0.0004207 |
| Sweden | Belgium | 0.0004170 |
| Kenya | Italy | 0.0004158 |
| Australia | Belgium | 0.0004111 |
| Australia | China | 0.0004104 |
| New Zealand | Chile | 0.0004022 |
| Norway | Mexico | 0.0004020 |
| Switzerland | Peru | 0.0003968 |
| Greece | Italy | 0.0003939 |
| Mexico | Germany | 0.0003866 |
| United Kingdom | Uruguay | 0.0003857 |
| New Zealand | Austria | 0.0003849 |
| United Kingdom | Panama | 0.0003843 |
| Denmark | Colombia | 0.0003788 |
| Hungary | Unknown | 0.0003641 |
| Mexico | Mauritius | 0.0003635 |
| Taiwan | Switzerland | 0.0003634 |
| Mexico | United Kingdom | 0.0003591 |
| New Zealand | China | 0.0003576 |
| Netherlands | Iran | 0.0003570 |
| Spain | Denmark | 0.0003525 |
| Canada | Lithuania | 0.0003433 |
| Chile | Poland | 0.0003373 |
| Argentina | United Kingdom | 0.0003362 |
| Argentina | Germany | 0.0003359 |
| Chile | Norway | 0.0003308 |
| Chile | New Zealand | 0.0003307 |
| Chile | New Caledonia | 0.0003307 |
| Chile | Uruguay | 0.0003307 |
| Greece | France | 0.0003306 |
| Kenya | United Kingdom | 0.0003302 |
| Chile | Panama | 0.0003295 |
| United States | Bulgaria | 0.0003287 |
| Kenya | Germany | 0.0003286 |
| Chile | South Korea | 0.0003284 |
| Chile | Italy | 0.0003263 |
| Sweden | Brazil | 0.0003262 |
| Chile | Taiwan | 0.0003259 |
| Ukraine | Netherlands | 0.0003244 |
| Colombia | Ireland | 0.0003226 |
| Greenland | Australia | 0.0003199 |
| China | New Caledonia | 0.0003148 |
| Sweden | Peru | 0.0003111 |
| China | Hong Kong | 0.0003084 |
| Singapore | Italy | 0.0003035 |
| Germany | Russia | 0.0003016 |
| Colombia | Mauritius | 0.0002995 |
| Finland | Czech Republic | 0.0002985 |
| Belgium | New Zealand | 0.0002953 |
| Australia | Poland | 0.0002931 |
| Germany | Uruguay | 0.0002898 |
| Ukraine | Peru | 0.0002887 |
| Germany | Panama | 0.0002887 |
| Australia | Taiwan | 0.0002886 |
| Germany | Taiwan | 0.0002857 |
| Colombia | Austria | 0.0002794 |
| Czech Republic | United States | 0.0002765 |
| France | Greece | 0.0002752 |
| Norway | Austria | 0.0002726 |
| Colombia | Italy | 0.0002699 |
| South Korea | Unknown | 0.0002657 |
| New Zealand | Romania | 0.0002650 |
| Switzerland | New Zealand | 0.0002595 |
| New Zealand | Poland | 0.0002587 |
| Australia | Uruguay | 0.0002552 |
| Norway | China | 0.0002544 |
| Australia | Panama | 0.0002543 |
| Australia | South Korea | 0.0002535 |
| Unknown | Hungary | 0.0002507 |
| Greenland | United Kingdom | 0.0002503 |
| Australia | Bulgaria | 0.0002501 |
| Brazil | Italy | 0.0002499 |
| China | Ireland | 0.0002488 |
| Ukraine | Austria | 0.0002488 |
| Colombia | China | 0.0002466 |
| Greenland | Germany | 0.0002463 |
| New Zealand | New Caledonia | 0.0002448 |
| New Zealand | Uruguay | 0.0002448 |
| New Zealand | Panama | 0.0002439 |
| New Zealand | South Korea | 0.0002431 |
| New Zealand | Taiwan | 0.0002413 |
| Chile | Ireland | 0.0002383 |
| Norway | Denmark | 0.0002368 |
| Italy | Netherlands | 0.0002359 |
| Bulgaria | Curaçao | 0.0002340 |
| Switzerland | Denmark | 0.0002328 |
| Ukraine | China | 0.0002322 |
| Belgium | Italy | 0.0002315 |
| Antarctica | Switzerland | 0.0002276 |
| Colombia | Curaçao | 0.0002251 |
| Norway | Ireland | 0.0002247 |
| Belgium | France | 0.0002243 |
| Japan | Iran | 0.0002198 |
| Ukraine | Canada | 0.0002178 |
| Ukraine | Denmark | 0.0002168 |
| Lithuania | Germany | 0.0002166 |
| Uruguay | Germany | 0.0002104 |
| South Korea | Italy | 0.0002093 |
| Taiwan | Germany | 0.0002048 |
| Belgium | Netherlands | 0.0002040 |
| Ukraine | Brazil | 0.0002026 |
| China | Czech Republic | 0.0002016 |
| Kosovo | Germany | 0.0002013 |
| Sweden | Romania | 0.0002003 |
| Uganda | Netherlands | 0.0001980 |
| Chile | Belgium | 0.0001971 |
| Spain | Sweden | 0.0001940 |
| Portugal | Germany | 0.0001919 |
| Peru | United Kingdom | 0.0001906 |
| Sweden | Mauritius | 0.0001903 |
| Taiwan | Brazil | 0.0001844 |
| Australia | Ireland | 0.0001842 |
| Uzbekistan | United States | 0.0001838 |
| Hungary | United States | 0.0001833 |
| Colombia | South Korea | 0.0001818 |
| Norway | Poland | 0.0001803 |
| Sweden | Israel | 0.0001802 |
| United States | Ecuador | 0.0001800 |
| Unknown | India | 0.0001791 |
| United States | Finland | 0.0001751 |
| Colombia | Norway | 0.0001746 |
| Norway | New Zealand | 0.0001742 |
| Norway | New Caledonia | 0.0001742 |
| Norway | Uruguay | 0.0001742 |
| Hungary | Australia | 0.0001739 |
| Norway | Panama | 0.0001736 |
| United Kingdom | Romania | 0.0001729 |
| Uruguay | Unknown | 0.0001728 |
| Norway | Taiwan | 0.0001717 |
| Italy | Sweden | 0.0001707 |
| Finland | Curaçao | 0.0001692 |
| Colombia | New Caledonia | 0.0001686 |
| Colombia | Uruguay | 0.0001686 |
| Colombia | Panama | 0.0001680 |
| Colombia | Taiwan | 0.0001661 |
| Czech Republic | Norway | 0.0001657 |
| Chile | Switzerland | 0.0001654 |
| Luxembourg | Unknown | 0.0001654 |
| United Kingdom | Mauritius | 0.0001642 |
| Mexico | Netherlands | 0.0001642 |
| Colombia | Lithuania | 0.0001625 |
| Ukraine | Poland | 0.0001623 |
| Germany | Nigeria | 0.0001612 |
| Ukraine | Norway | 0.0001590 |
| Ukraine | New Zealand | 0.0001589 |
| Ukraine | New Caledonia | 0.0001589 |
| Ukraine | Uruguay | 0.0001589 |
| Ukraine | Panama | 0.0001584 |
| Ukraine | South Korea | 0.0001578 |
| Argentina | Netherlands | 0.0001569 |
| Ukraine | Taiwan | 0.0001566 |
| Greece | Norway | 0.0001557 |
| New Zealand | Belgium | 0.0001537 |
| Kenya | Netherlands | 0.0001529 |
| Uganda | Austria | 0.0001526 |
| Singapore | France | 0.0001518 |
| Belgium | China | 0.0001514 |
| Singapore | Nigeria | 0.0001511 |
| Taiwan | Netherlands | 0.0001504 |
| Belgium | Taiwan | 0.0001489 |
| Mexico | Peru | 0.0001473 |
| Belgium | Poland | 0.0001472 |
| Finland | China | 0.0001468 |
| Uganda | China | 0.0001442 |
| Switzerland | South Korea | 0.0001428 |
| Lithuania | United States | 0.0001422 |
| Nigeria | Italy | 0.0001412 |
| Czech Republic | Germany | 0.0001405 |
| Singapore | Spain | 0.0001403 |
| United States | Turkey | 0.0001395 |
| Norway | Belgium | 0.0001390 |
| Ukraine | Belgium | 0.0001387 |
| Netherlands | Curaçao | 0.0001378 |
| Latvia | Unknown | 0.0001378 |
| Uganda | Canada | 0.0001347 |
| Switzerland | China | 0.0001340 |
| Uganda | Peru | 0.0001340 |
| Belgium | Sweden | 0.0001336 |
| Mexico | Italy | 0.0001331 |
| Hong Kong | Unknown | 0.0001330 |
| Uganda | Denmark | 0.0001321 |
| Norway | Switzerland | 0.0001316 |
| Czech Republic | Australia | 0.0001300 |
| Finland | Brazil | 0.0001300 |
| China | Colombia | 0.0001296 |
| Mexico | Austria | 0.0001266 |
| Uganda | Brazil | 0.0001259 |
| Switzerland | Sweden | 0.0001248 |
| Argentina | Austria | 0.0001236 |
| Australia | Romania | 0.0001225 |
| Germany | South Africa | 0.0001198 |
| Mexico | China | 0.0001196 |
| Denmark | South Africa | 0.0001190 |
| Kenya | Austria | 0.0001182 |
| Australia | Mauritius | 0.0001163 |
| Austria | Argentina | 0.0001163 |
| Greenland | Netherlands | 0.0001159 |
| Ukraine | Ireland | 0.0001148 |
| China | Belgium | 0.0001133 |
| Turkey | China | 0.0001127 |
| Argentina | China | 0.0001126 |
| Argentina | Denmark | 0.0001119 |
| Mexico | Canada | 0.0001117 |
| Kenya | China | 0.0001114 |
| Austria | Iran | 0.0001113 |
| Turkey | Taiwan | 0.0001111 |
| Mexico | Denmark | 0.0001096 |
| Chile | Czech Republic | 0.0001086 |
| Bulgaria | Bulgaria | 0.0001084 |
| Argentina | Canada | 0.0001079 |
| Finland | Italy | 0.0001050 |
| Argentina | Peru | 0.0001046 |
| Mexico | Brazil | 0.0001044 |
| Switzerland | Romania | 0.0001042 |
| Kenya | Canada | 0.0001041 |
| Kenya | Peru | 0.0001035 |
| Poland | China | 0.0001028 |
| Greece | Czech Republic | 0.0001022 |
| Kenya | Denmark | 0.0001020 |
| Czech Republic | Italy | 0.0001015 |
| Uganda | Poland | 0.0001007 |
| Finland | New Zealand | 0.0001003 |
| Finland | Uruguay | 0.0001003 |
| Finland | New Caledonia | 0.0001003 |
| Finland | Panama | 0.0001000 |
| Finland | South Korea | 0.0000996 |
| Unknown | Bulgaria | 0.0000995 |
| Switzerland | Mauritius | 0.0000989 |
| Finland | Taiwan | 0.0000989 |
| Czech Republic | Czech Republic | 0.0000989 |
| Uganda | Norway | 0.0000987 |
| Uganda | New Zealand | 0.0000987 |
| Uganda | New Caledonia | 0.0000987 |
| Uganda | Uruguay | 0.0000987 |
| Uganda | Panama | 0.0000984 |
| Argentina | Brazil | 0.0000983 |
| Uganda | South Korea | 0.0000981 |
| Czech Republic | Israel | 0.0000976 |
| Uganda | Italy | 0.0000974 |
| Uganda | Taiwan | 0.0000973 |
| Kenya | Brazil | 0.0000972 |
| Switzerland | Taiwan | 0.0000971 |
| Portugal | Peru | 0.0000955 |
| Spain | Iran | 0.0000940 |
| Canada | Mauritius | 0.0000940 |
| Greece | Australia | 0.0000939 |
| Greece | Israel | 0.0000939 |
| Niue | Australia | 0.0000918 |
| Italy | Poland | 0.0000917 |
| Italy | Portugal | 0.0000917 |
| France | Bulgaria | 0.0000915 |
| Australia | South Africa | 0.0000907 |
| Denmark | Lithuania | 0.0000905 |
| Japan | Colombia | 0.0000901 |
| Switzerland | Poland | 0.0000900 |
| Greenland | Austria | 0.0000894 |
| Singapore | Colombia | 0.0000883 |
| Sweden | Canada | 0.0000878 |
| Unknown | Turkey | 0.0000877 |
| South Africa | United States | 0.0000871 |
| Italy | Denmark | 0.0000865 |
| Uzbekistan | Australia | 0.0000844 |
| Greenland | China | 0.0000844 |
| Mexico | Poland | 0.0000835 |
| Argentina | Poland | 0.0000831 |
| Argentina | Norway | 0.0000831 |
| Kenya | Norway | 0.0000823 |
| Ukraine | Switzerland | 0.0000820 |
| Mexico | Norway | 0.0000819 |
| Mexico | New Zealand | 0.0000819 |
| Mexico | New Caledonia | 0.0000819 |
| Mexico | Uruguay | 0.0000819 |
| Nepal | United States | 0.0000819 |
| Mexico | Panama | 0.0000816 |
| Mexico | South Korea | 0.0000813 |
| Mexico | Taiwan | 0.0000807 |
| Belgium | Nigeria | 0.0000806 |
| Spain | Austria | 0.0000805 |
| Argentina | Italy | 0.0000796 |
| Poland | Romania | 0.0000795 |
| Bulgaria | Croatia | 0.0000792 |
| Greenland | Canada | 0.0000789 |
| Greenland | Peru | 0.0000784 |
| Czech Republic | Chile | 0.0000781 |
| Russia | Unknown | 0.0000780 |
| Kenya | Poland | 0.0000778 |
| Greenland | Denmark | 0.0000773 |
| Argentina | New Zealand | 0.0000771 |
| Argentina | New Caledonia | 0.0000771 |
| Argentina | Uruguay | 0.0000771 |
| Argentina | Panama | 0.0000768 |
| Argentina | South Korea | 0.0000765 |
| Kenya | New Zealand | 0.0000763 |
| Kenya | New Caledonia | 0.0000763 |
| Kenya | Uruguay | 0.0000763 |
| Kenya | Panama | 0.0000760 |
| Argentina | Taiwan | 0.0000759 |
| Kenya | South Korea | 0.0000757 |
| Greece | Chile | 0.0000752 |
| Kenya | Taiwan | 0.0000752 |
| Switzerland | Ireland | 0.0000747 |
| China | Sweden | 0.0000743 |
| Greenland | Brazil | 0.0000737 |
| Finland | Ireland | 0.0000723 |
| Egypt | Unknown | 0.0000723 |
| France | Lithuania | 0.0000722 |
| Belgium | Denmark | 0.0000715 |
| Uganda | Ireland | 0.0000711 |
| Finland | Switzerland | 0.0000708 |
| Portugal | Spain | 0.0000704 |
| Portugal | Romania | 0.0000704 |
| Czech Republic | France | 0.0000702 |
| Brazil | Netherlands | 0.0000676 |
| Portugal | Mauritius | 0.0000669 |
| Uzbekistan | United Kingdom | 0.0000660 |
| Colombia | Czech Republic | 0.0000651 |
| Uzbekistan | Germany | 0.0000650 |
| Hong Kong | United States | 0.0000642 |
| Australia | Sweden | 0.0000626 |
| Israel | Israel | 0.0000616 |
| Switzerland | Uruguay | 0.0000610 |
| Switzerland | New Caledonia | 0.0000610 |
| Switzerland | Panama | 0.0000608 |
| United States | Japan | 0.0000597 |
| Mexico | Ireland | 0.0000590 |
| Greenland | Poland | 0.0000589 |
| Colombia | Iran | 0.0000581 |
| Belgium | Ireland | 0.0000581 |
| Luxembourg | United States | 0.0000580 |
| Greenland | Norway | 0.0000578 |
| Greenland | New Zealand | 0.0000578 |
| Greenland | New Caledonia | 0.0000578 |
| Greenland | Uruguay | 0.0000578 |
| Greenland | Panama | 0.0000576 |
| Greenland | South Korea | 0.0000574 |
| Greenland | Italy | 0.0000570 |
| Greenland | Taiwan | 0.0000570 |
| Nepal | Unknown | 0.0000567 |
| Uganda | Belgium | 0.0000558 |
| Argentina | Ireland | 0.0000555 |
| Sweden | Poland | 0.0000553 |
| Kenya | Ireland | 0.0000549 |
| United Kingdom | Lithuania | 0.0000542 |
| Ukraine | Czech Republic | 0.0000522 |
| Spain | New Caledonia | 0.0000514 |
| Spain | Uruguay | 0.0000514 |
| Spain | Panama | 0.0000512 |
| Uganda | Switzerland | 0.0000494 |
| Brazil | Canada | 0.0000492 |
| Belgium | Norway | 0.0000487 |
| New Zealand | Sweden | 0.0000486 |
| Latvia | United States | 0.0000484 |
| Hong Kong | Norway | 0.0000479 |
| Portugal | United Kingdom | 0.0000479 |
| China | New Zealand | 0.0000475 |
| Mexico | Belgium | 0.0000462 |
| Nigeria | Unknown | 0.0000460 |
| Ireland | Unknown | 0.0000457 |
| Israel | Australia | 0.0000454 |
| Netherlands | Colombia | 0.0000454 |
| Niue | Unknown | 0.0000449 |
| Sweden | Austria | 0.0000447 |
| Denmark | Turkey | 0.0000443 |
| Finland | Finland | 0.0000438 |
| Argentina | Belgium | 0.0000435 |
| Kenya | Belgium | 0.0000431 |
| Portugal | Vietnam | 0.0000426 |
| Sweden | China | 0.0000422 |
| Antarctica | Poland | 0.0000420 |
| Argentina | Switzerland | 0.0000420 |
| Greenland | Ireland | 0.0000416 |
| Brazil | Brazil | 0.0000414 |
| Mexico | Switzerland | 0.0000409 |
| Portugal | France | 0.0000409 |
| Netherlands | Nigeria | 0.0000403 |
| Costa Rica | Unknown | 0.0000398 |
| South Africa | Germany | 0.0000392 |
| Brazil | Poland | 0.0000389 |
| Kenya | Switzerland | 0.0000381 |
| Taiwan | Denmark | 0.0000371 |
| Sweden | Iran | 0.0000369 |
| Japan | Lithuania | 0.0000362 |
| China | Lithuania | 0.0000361 |
| Italy | Russia | 0.0000357 |
| Niue | Russia | 0.0000357 |
| Egypt | United States | 0.0000357 |
| Netherlands | Vietnam | 0.0000353 |
| Poland | Peru | 0.0000349 |
| Portugal | Australia | 0.0000342 |
| Russia | Israel | 0.0000341 |
| Canada | Turkey | 0.0000338 |
| Austria | Denmark | 0.0000337 |
| Finland | Singapore | 0.0000336 |
| Germany | Curaçao | 0.0000334 |
| Switzerland | Vietnam | 0.0000331 |
| Poland | Brazil | 0.0000328 |
| Greenland | Belgium | 0.0000326 |
| Uganda | Czech Republic | 0.0000324 |
| Belgium | Czech Republic | 0.0000324 |
| United Kingdom | Iran | 0.0000318 |
| Bulgaria | South Africa | 0.0000317 |
| Brazil | Austria | 0.0000312 |
| Uzbekistan | Netherlands | 0.0000306 |
| Argentina | Czech Republic | 0.0000304 |
| Antarctica | New Zealand | 0.0000303 |
| Colombia | Nigeria | 0.0000302 |
| New Zealand | Nigeria | 0.0000302 |
| Spain | Nigeria | 0.0000302 |
| Singapore | United States | 0.0000299 |
| Brazil | Denmark | 0.0000298 |
| Brazil | China | 0.0000293 |
| Brazil | South Korea | 0.0000291 |
| Hong Kong | Germany | 0.0000290 |
| Belgium | Israel | 0.0000289 |
| Hong Kong | Australia | 0.0000289 |
| Hong Kong | Czech Republic | 0.0000289 |
| Hong Kong | Israel | 0.0000289 |
| Hong Kong | Italy | 0.0000289 |
| Sweden | Norway | 0.0000289 |
| Greenland | Switzerland | 0.0000289 |
| Sweden | New Zealand | 0.0000289 |
| Sweden | New Caledonia | 0.0000289 |
| Sweden | Uruguay | 0.0000289 |
| Sweden | Italy | 0.0000288 |
| Sweden | Panama | 0.0000288 |
| Brazil | Peru | 0.0000288 |
| Sweden | South Korea | 0.0000287 |
| Kenya | Czech Republic | 0.0000286 |
| Sweden | Taiwan | 0.0000285 |
| South Africa | United Kingdom | 0.0000279 |
| Iceland | Unknown | 0.0000276 |
| Poland | South Africa | 0.0000269 |
| Mexico | Czech Republic | 0.0000269 |
| Luxembourg | Australia | 0.0000267 |
| Brazil | Norway | 0.0000261 |
| South Africa | Belgium | 0.0000260 |
| Poland | New Zealand | 0.0000257 |
| Poland | New Caledonia | 0.0000257 |
| Poland | Uruguay | 0.0000257 |
| Spain | Greece | 0.0000257 |
| Poland | Panama | 0.0000256 |
| Poland | South Korea | 0.0000255 |
| Poland | Taiwan | 0.0000253 |
| Israel | United Kingdom | 0.0000251 |
| Ukraine | South Africa | 0.0000240 |
| Unknown | Luxembourg | 0.0000236 |
| Uzbekistan | Austria | 0.0000236 |
| Belgium | Chile | 0.0000231 |
| Hong Kong | Chile | 0.0000231 |
| India | Denmark | 0.0000231 |
| South Africa | Italy | 0.0000226 |
| Australia | Iran | 0.0000226 |
| Uzbekistan | China | 0.0000223 |
| Latvia | Australia | 0.0000222 |
| Cyprus | Unknown | 0.0000222 |
| Germany | Vietnam | 0.0000217 |
| South Africa | Spain | 0.0000217 |
| Australia | Fiji | 0.0000212 |
| Ukraine | Curaçao | 0.0000212 |
| Spain | Czech Republic | 0.0000209 |
| Luxembourg | United Kingdom | 0.0000209 |
| Colombia | Bulgaria | 0.0000208 |
| Uzbekistan | Canada | 0.0000208 |
| Hong Kong | France | 0.0000208 |
| Uzbekistan | Peru | 0.0000207 |
| South Africa | Netherlands | 0.0000206 |
| Ireland | Italy | 0.0000206 |
| Luxembourg | Germany | 0.0000205 |
| Uzbekistan | Denmark | 0.0000204 |
| Israel | Denmark | 0.0000203 |
| Norway | Romania | 0.0000203 |
| Brazil | New Caledonia | 0.0000201 |
| Brazil | Uruguay | 0.0000201 |
| Brazil | Panama | 0.0000200 |
| South Africa | Peru | 0.0000200 |
| Russia | Germany | 0.0000199 |
| Brazil | Taiwan | 0.0000198 |
| Uzbekistan | Brazil | 0.0000194 |
| Antarctica | Canada | 0.0000194 |
| Egypt | Germany | 0.0000194 |
| Norway | Mauritius | 0.0000193 |
| Switzerland | Iran | 0.0000192 |
| Antarctica | United Kingdom | 0.0000190 |
| Greenland | Czech Republic | 0.0000190 |
| Poland | Ireland | 0.0000185 |
| Canada | Iran | 0.0000182 |
| Brazil | Spain | 0.0000181 |
| Ghana | Germany | 0.0000181 |
| Ghana | Peru | 0.0000181 |
| Czech Republic | United Kingdom | 0.0000174 |
| Latvia | United Kingdom | 0.0000174 |
| Antarctica | Netherlands | 0.0000172 |
| Italy | New Zealand | 0.0000171 |
| Latvia | Germany | 0.0000171 |
| South Africa | France | 0.0000171 |
| Taiwan | Sweden | 0.0000170 |
| Cyprus | Germany | 0.0000161 |
| Portugal | Canada | 0.0000161 |
| Mexico | Russia | 0.0000159 |
| Uzbekistan | Poland | 0.0000156 |
| Uzbekistan | Norway | 0.0000153 |
| Uzbekistan | New Zealand | 0.0000153 |
| Uzbekistan | New Caledonia | 0.0000153 |
| Uzbekistan | Uruguay | 0.0000153 |
| Antarctica | Belgium | 0.0000152 |
| Uzbekistan | Panama | 0.0000152 |
| South Africa | Australia | 0.0000152 |
| China | Peru | 0.0000152 |
| Uzbekistan | South Korea | 0.0000151 |
| Spain | Croatia | 0.0000150 |
| Uzbekistan | Italy | 0.0000150 |
| Uzbekistan | Taiwan | 0.0000150 |
| Argentina | Israel | 0.0000150 |
| Portugal | Finland | 0.0000149 |
| Poland | Belgium | 0.0000145 |
| Czech Republic | New Zealand | 0.0000143 |
| New Caledonia | United Kingdom | 0.0000138 |
| Brazil | Belgium | 0.0000136 |
| Poland | Switzerland | 0.0000131 |
| Egypt | United Kingdom | 0.0000131 |
| Portugal | Iran | 0.0000130 |
| Switzerland | Curaçao | 0.0000130 |
| Japan | Curaçao | 0.0000130 |
| Netherlands | Finland | 0.0000126 |
| United Kingdom | South Africa | 0.0000123 |
| South Africa | Romania | 0.0000123 |
| Bangladesh | Germany | 0.0000120 |
| South Africa | Mauritius | 0.0000117 |
| India | Poland | 0.0000117 |
| Germany | Finland | 0.0000117 |
| India | Austria | 0.0000116 |
| Peru | Unknown | 0.0000116 |
| Switzerland | Finland | 0.0000116 |
| Peru | Israel | 0.0000114 |
| United States | Argentina | 0.0000112 |
| Belgium | Austria | 0.0000111 |
| Greece | Canada | 0.0000111 |
| Israel | Austria | 0.0000111 |
| Uzbekistan | Ireland | 0.0000110 |
| Israel | Germany | 0.0000110 |
| Greece | Switzerland | 0.0000109 |
| Cyprus | Switzerland | 0.0000109 |
| Poland | Sweden | 0.0000109 |
| Czech Republic | Sweden | 0.0000108 |
| Costa Rica | Germany | 0.0000108 |
| India | United Kingdom | 0.0000107 |
| Israel | Poland | 0.0000107 |
| Japan | Turkey | 0.0000105 |
| Australia | Curaçao | 0.0000102 |
| India | France | 0.0000102 |
| Brazil | Czech Republic | 0.0000102 |
| Niue | Israel | 0.0000101 |
| Ireland | France | 0.0000101 |
| Ireland | Nigeria | 0.0000101 |
| Norway | Nigeria | 0.0000101 |
| South Africa | Nigeria | 0.0000101 |
| Portugal | Sweden | 0.0000100 |
| China | Argentina | 0.0000100 |
| India | Spain | 0.0000099 |
| Nigeria | United States | 0.0000097 |
| Luxembourg | Netherlands | 0.0000097 |
| Austria | Austria | 0.0000096 |
| Sweden | Czech Republic | 0.0000095 |
| China | China | 0.0000095 |
| Ireland | Spain | 0.0000094 |
| Chile | Sweden | 0.0000092 |
| India | Germany | 0.0000091 |
| Spain | Lithuania | 0.0000090 |
| Egypt | Australia | 0.0000089 |
| Singapore | Norway | 0.0000088 |
| Uzbekistan | Belgium | 0.0000086 |
| Ireland | United States | 0.0000083 |
| Greece | Russia | 0.0000082 |
| Israel | Netherlands | 0.0000082 |
| Austria | China | 0.0000082 |
| New Zealand | Curaçao | 0.0000082 |
| Costa Rica | Norway | 0.0000081 |
| Czech Republic | Netherlands | 0.0000080 |
| Latvia | Netherlands | 0.0000080 |
| Spain | South Africa | 0.0000080 |
| France | Croatia | 0.0000080 |
| Brazil | Russia | 0.0000079 |
| New Zealand | Russia | 0.0000078 |
| India | Netherlands | 0.0000078 |
| Argentina | France | 0.0000076 |
| Uzbekistan | Switzerland | 0.0000076 |
| Antarctica | Italy | 0.0000076 |
| Luxembourg | Austria | 0.0000074 |
| Denmark | Finland | 0.0000074 |
| Norway | Sweden | 0.0000073 |
| Israel | Canada | 0.0000073 |
| Slovenia | Unknown | 0.0000073 |
| Chile | France | 0.0000072 |
| Austria | Brazil | 0.0000072 |
| India | Canada | 0.0000071 |
| Luxembourg | China | 0.0000070 |
| Portugal | Switzerland | 0.0000070 |
| Costa Rica | Australia | 0.0000070 |
| Costa Rica | New Zealand | 0.0000068 |
| Singapore | New Zealand | 0.0000068 |
| Canada | Croatia | 0.0000068 |
| Uruguay | Switzerland | 0.0000067 |
| Netherlands | Greece | 0.0000067 |
| Colombia | Israel | 0.0000066 |
| Luxembourg | Canada | 0.0000066 |
| Luxembourg | Peru | 0.0000065 |
| Austria | Italy | 0.0000065 |
| Taiwan | United Kingdom | 0.0000065 |
| Brazil | Israel | 0.0000065 |
| Luxembourg | Denmark | 0.0000064 |
| China | Norway | 0.0000064 |
| China | Uruguay | 0.0000064 |
| China | Panama | 0.0000064 |
| China | South Korea | 0.0000064 |
| China | Taiwan | 0.0000063 |
| Brazil | Sweden | 0.0000063 |
| Egypt | Peru | 0.0000063 |
| Czech Republic | Austria | 0.0000062 |
| Latvia | Austria | 0.0000062 |
| New Zealand | Greece | 0.0000062 |
| Luxembourg | Brazil | 0.0000061 |
| Singapore | Denmark | 0.0000061 |
| Czech Republic | China | 0.0000059 |
| Latvia | China | 0.0000059 |
| South Africa | Canada | 0.0000058 |
| Austria | Norway | 0.0000056 |
| Austria | New Zealand | 0.0000056 |
| Austria | Uruguay | 0.0000056 |
| Austria | Panama | 0.0000056 |
| Austria | South Korea | 0.0000056 |
| Austria | Taiwan | 0.0000055 |
| Czech Republic | Canada | 0.0000055 |
| Latvia | Canada | 0.0000055 |
| Czech Republic | Peru | 0.0000054 |
| Latvia | Peru | 0.0000054 |
| Czech Republic | Denmark | 0.0000054 |
| Latvia | Denmark | 0.0000054 |
| Antarctica | Brazil | 0.0000053 |
| Costa Rica | United States | 0.0000052 |
| Portugal | Japan | 0.0000052 |
| Chile | Spain | 0.0000052 |
| Chile | Romania | 0.0000052 |
| Czech Republic | Brazil | 0.0000051 |
| Latvia | Brazil | 0.0000051 |
| New Zealand | Fiji | 0.0000050 |
| Uzbekistan | Czech Republic | 0.0000050 |
| Luxembourg | Poland | 0.0000049 |
| Chile | Mauritius | 0.0000049 |
| Israel | Norway | 0.0000048 |
| Luxembourg | Norway | 0.0000048 |
| Luxembourg | New Zealand | 0.0000048 |
| Luxembourg | New Caledonia | 0.0000048 |
| Luxembourg | Uruguay | 0.0000048 |
| Spain | Israel | 0.0000048 |
| Luxembourg | Panama | 0.0000048 |
| Luxembourg | South Korea | 0.0000048 |
| Spain | Curaçao | 0.0000048 |
| Luxembourg | Italy | 0.0000048 |
| Luxembourg | Taiwan | 0.0000047 |
| Ukraine | Sweden | 0.0000047 |
| United Kingdom | Saudi Arabia | 0.0000047 |
| Ukraine | France | 0.0000047 |
| Taiwan | Greece | 0.0000046 |
| Netherlands | Turkey | 0.0000046 |
| Ukraine | Spain | 0.0000046 |
| Ukraine | Romania | 0.0000046 |
| Mexico | Israel | 0.0000045 |
| Unknown | Tunisia | 0.0000045 |
| Nigeria | Australia | 0.0000044 |
| Ukraine | Mauritius | 0.0000043 |
| Netherlands | Japan | 0.0000043 |
| United States | India | 0.0000042 |
| South Africa | Denmark | 0.0000041 |
| Portugal | Greece | 0.0000041 |
| Czech Republic | Poland | 0.0000041 |
| Latvia | Poland | 0.0000041 |
| Nepal | Germany | 0.0000040 |
| United States | Fiji | 0.0000040 |
| Latvia | Norway | 0.0000040 |
| Latvia | New Zealand | 0.0000040 |
| Czech Republic | New Caledonia | 0.0000040 |
| Czech Republic | Uruguay | 0.0000040 |
| Latvia | New Caledonia | 0.0000040 |
| Latvia | Uruguay | 0.0000040 |
| Switzerland | Japan | 0.0000040 |
| Czech Republic | Panama | 0.0000040 |
| Latvia | Panama | 0.0000040 |
| Czech Republic | South Korea | 0.0000040 |
| Latvia | South Korea | 0.0000040 |
| India | Switzerland | 0.0000040 |
| Norway | Russia | 0.0000040 |
| Latvia | Italy | 0.0000040 |
| Czech Republic | Taiwan | 0.0000040 |
| Latvia | Taiwan | 0.0000040 |
| India | Czech Republic | 0.0000038 |
| Norway | Iran | 0.0000037 |
| China | South Africa | 0.0000037 |
| South Africa | Austria | 0.0000037 |
| Greece | Hungary | 0.0000036 |
| Kenya | Israel | 0.0000036 |
| Kenya | France | 0.0000036 |
| Denmark | Argentina | 0.0000036 |
| South Africa | China | 0.0000035 |
| Israel | Czech Republic | 0.0000035 |
| Saudi Arabia | Germany | 0.0000035 |
| Nigeria | United Kingdom | 0.0000035 |
| Luxembourg | Ireland | 0.0000035 |
| Saudi Arabia | Switzerland | 0.0000034 |
| New Zealand | Hungary | 0.0000034 |
| Nepal | Belgium | 0.0000034 |
| Malaysia | Australia | 0.0000034 |
| Malaysia | New Zealand | 0.0000034 |
| Malaysia | Unknown | 0.0000034 |
| Nigeria | Germany | 0.0000034 |
| Egypt | Netherlands | 0.0000032 |
| Germany | Bulgaria | 0.0000031 |
| Brazil | Colombia | 0.0000031 |
| South Africa | Brazil | 0.0000031 |
| Austria | Switzerland | 0.0000030 |
| Brazil | Portugal | 0.0000030 |
| Bulgaria | Israel | 0.0000029 |
| Switzerland | South Africa | 0.0000029 |
| Canada | Finland | 0.0000029 |
| Czech Republic | Ireland | 0.0000029 |
| Brazil | Chile | 0.0000029 |
| British Virgin Islands | Unknown | 0.0000029 |
| Argentina | Chile | 0.0000029 |
| Colombia | Chile | 0.0000029 |
| Kenya | Chile | 0.0000029 |
| Spain | Chile | 0.0000029 |
| Latvia | Ireland | 0.0000029 |
| Iceland | United States | 0.0000029 |
| Portugal | Denmark | 0.0000028 |
| Italy | Canada | 0.0000028 |
| Belgium | Belgium | 0.0000027 |
| Israel | Belgium | 0.0000027 |
| United Kingdom | Curaçao | 0.0000027 |
| Luxembourg | Belgium | 0.0000027 |
| Canada | Argentina | 0.0000027 |
| Poland | Bulgaria | 0.0000027 |
| Israel | Peru | 0.0000026 |
| Germany | Japan | 0.0000026 |
| Uganda | Sweden | 0.0000025 |
| South Africa | Poland | 0.0000025 |
| Egypt | Denmark | 0.0000025 |
| Egypt | Austria | 0.0000025 |
| British Virgin Islands | United Kingdom | 0.0000024 |
| South Africa | Norway | 0.0000024 |
| Luxembourg | Switzerland | 0.0000024 |
| South Africa | New Zealand | 0.0000024 |
| South Africa | New Caledonia | 0.0000024 |
| South Africa | Uruguay | 0.0000024 |
| South Africa | Panama | 0.0000024 |
| South Africa | South Korea | 0.0000024 |
| South Africa | Taiwan | 0.0000024 |
| Egypt | China | 0.0000023 |
| Nepal | Peru | 0.0000023 |
| United Kingdom | Bulgaria | 0.0000023 |
| Saudi Arabia | United States | 0.0000023 |
| Costa Rica | Belgium | 0.0000023 |
| Canada | Russia | 0.0000023 |
| Czech Republic | Belgium | 0.0000023 |
| Latvia | Belgium | 0.0000023 |
| South Africa | Iran | 0.0000023 |
| Unknown | Slovakia | 0.0000023 |
| Bulgaria | Colombia | 0.0000022 |
| Ireland | Australia | 0.0000022 |
| Egypt | Canada | 0.0000022 |
| Nepal | Netherlands | 0.0000021 |
| Austria | Czech Republic | 0.0000021 |
| Kenya | South Africa | 0.0000021 |
| China | Vietnam | 0.0000021 |
| Mexico | Sweden | 0.0000021 |
| Egypt | Brazil | 0.0000020 |
| Czech Republic | Switzerland | 0.0000020 |
| Latvia | Switzerland | 0.0000020 |
| India | Brazil | 0.0000020 |
| Argentina | Sweden | 0.0000020 |
| Kenya | Sweden | 0.0000019 |
| South Africa | Sweden | 0.0000018 |
| Slovenia | Denmark | 0.0000017 |
| Indonesia | Germany | 0.0000017 |
| Ireland | United Kingdom | 0.0000017 |
| South Africa | Ireland | 0.0000017 |
| Indonesia | Switzerland | 0.0000017 |
| France | Finland | 0.0000017 |
| Denmark | Romania | 0.0000017 |
| Nepal | Romania | 0.0000017 |
| Nepal | Spain | 0.0000017 |
| Ireland | Germany | 0.0000017 |
| Egypt | Poland | 0.0000016 |
| China | Mauritius | 0.0000016 |
| Denmark | Mauritius | 0.0000016 |
| Nepal | Mauritius | 0.0000016 |
| Nigeria | Netherlands | 0.0000016 |
| Egypt | Norway | 0.0000016 |
| Egypt | New Zealand | 0.0000016 |
| Egypt | New Caledonia | 0.0000016 |
| Egypt | Uruguay | 0.0000016 |
| Egypt | Panama | 0.0000016 |
| Egypt | South Korea | 0.0000016 |
| Egypt | Italy | 0.0000016 |
| Egypt | Taiwan | 0.0000016 |
| Luxembourg | Czech Republic | 0.0000016 |
| Costa Rica | Peru | 0.0000016 |
| Hungary | Germany | 0.0000015 |
| Portugal | China | 0.0000015 |
| Greenland | Sweden | 0.0000015 |
| Bangladesh | Unknown | 0.0000015 |
| Costa Rica | Netherlands | 0.0000014 |
| Brazil | South Africa | 0.0000014 |
| Portugal | Singapore | 0.0000014 |
| Nigeria | Austria | 0.0000014 |
| Canada | Singapore | 0.0000013 |
| South Korea | South Africa | 0.0000013 |
| Latvia | Czech Republic | 0.0000013 |
| Denmark | India | 0.0000013 |
| Uganda | France | 0.0000013 |
| Finland | Colombia | 0.0000013 |
| Colombia | Croatia | 0.0000013 |
| Saudi Arabia | Unknown | 0.0000012 |
| South Africa | Switzerland | 0.0000012 |
| Denmark | Bulgaria | 0.0000012 |
| Israel | China | 0.0000012 |
| Nigeria | China | 0.0000012 |
| Netherlands | Singapore | 0.0000012 |
| Egypt | Ireland | 0.0000012 |
| Taiwan | Taiwan | 0.0000012 |
| Indonesia | United States | 0.0000011 |
| Israel | Spain | 0.0000011 |
| Belgium | Romania | 0.0000011 |
| Brazil | Romania | 0.0000011 |
| Costa Rica | Romania | 0.0000011 |
| Costa Rica | Spain | 0.0000011 |
| Israel | Romania | 0.0000011 |
| Nigeria | Canada | 0.0000011 |
| France | Turkey | 0.0000011 |
| New Caledonia | Sweden | 0.0000011 |
| Nigeria | Peru | 0.0000011 |
| Belgium | Mauritius | 0.0000011 |
| Brazil | Mauritius | 0.0000011 |
| Costa Rica | Mauritius | 0.0000011 |
| Israel | Mauritius | 0.0000011 |
| Switzerland | Singapore | 0.0000011 |
| Taiwan | Poland | 0.0000011 |
| Nigeria | Denmark | 0.0000011 |
| Mexico | France | 0.0000011 |
| Israel | Brazil | 0.0000010 |
| Nigeria | Brazil | 0.0000010 |
| Niue | United Kingdom | 0.0000010 |
| Switzerland | Croatia | 0.0000010 |
| Canada | Hungary | 0.0000010 |
| Finland | Portugal | 0.0000010 |
| Nepal | United Kingdom | 0.0000010 |
| Nepal | France | 0.0000010 |
| Chile | Iran | 0.0000009 |
| Ireland | Denmark | 0.0000009 |
| Chile | Colombia | 0.0000009 |
| Canada | Bulgaria | 0.0000009 |
| Egypt | Belgium | 0.0000009 |
| Slovenia | Poland | 0.0000009 |
| Slovenia | Austria | 0.0000009 |
| Slovenia | United States | 0.0000009 |
| Nigeria | South Africa | 0.0000009 |
| Japan | Argentina | 0.0000008 |
| Ukraine | Iran | 0.0000008 |
| Nigeria | Poland | 0.0000008 |
| Ireland | Netherlands | 0.0000008 |
| Nigeria | Norway | 0.0000008 |
| Egypt | Switzerland | 0.0000008 |
| Israel | New Zealand | 0.0000008 |
| Nigeria | New Zealand | 0.0000008 |
| Belgium | Uruguay | 0.0000008 |
| Israel | New Caledonia | 0.0000008 |
| Israel | Uruguay | 0.0000008 |
| Nigeria | New Caledonia | 0.0000008 |
| Nigeria | Uruguay | 0.0000008 |
| Belgium | Panama | 0.0000008 |
| Israel | Panama | 0.0000008 |
| Nigeria | Panama | 0.0000008 |
| Israel | South Korea | 0.0000008 |
| Nigeria | South Korea | 0.0000008 |
| Israel | Italy | 0.0000008 |
| Israel | Taiwan | 0.0000008 |
| Nigeria | Taiwan | 0.0000008 |
| South Africa | Czech Republic | 0.0000008 |
| Slovenia | France | 0.0000008 |
| Germany | Croatia | 0.0000008 |
| Greenland | France | 0.0000008 |
| Japan | Vietnam | 0.0000007 |
| China | Finland | 0.0000007 |
| Germany | Singapore | 0.0000007 |
| Australia | Colombia | 0.0000007 |
| Austria | Curaçao | 0.0000007 |
| New Zealand | Colombia | 0.0000007 |
| Costa Rica | United Kingdom | 0.0000007 |
| Costa Rica | France | 0.0000006 |
| Taiwan | Italy | 0.0000006 |
| Indonesia | Unknown | 0.0000006 |
| Ireland | Austria | 0.0000006 |
| Spain | Bulgaria | 0.0000006 |
| Japan | Finland | 0.0000006 |
| United States | Luxembourg | 0.0000006 |
| Ireland | China | 0.0000006 |
| Israel | Ireland | 0.0000006 |
| Nigeria | Ireland | 0.0000006 |
| Ireland | Canada | 0.0000005 |
| Slovenia | Canada | 0.0000005 |
| Ireland | Peru | 0.0000005 |
| New Zealand | South Africa | 0.0000005 |
| Egypt | Czech Republic | 0.0000005 |
| Ireland | Brazil | 0.0000005 |
| Norway | Colombia | 0.0000005 |
| Slovenia | Netherlands | 0.0000005 |
| Slovenia | United Kingdom | 0.0000005 |
| Colombia | Colombia | 0.0000005 |
| Denmark | Russia | 0.0000005 |
| Spain | Russia | 0.0000005 |
| Nigeria | Belgium | 0.0000005 |
| Unknown | Sri Lanka | 0.0000005 |
| Ukraine | Colombia | 0.0000004 |
| Ireland | Poland | 0.0000004 |
| Ireland | Norway | 0.0000004 |
| Israel | Switzerland | 0.0000004 |
| Nigeria | Switzerland | 0.0000004 |
| Ireland | New Zealand | 0.0000004 |
| Ireland | New Caledonia | 0.0000004 |
| Ireland | Uruguay | 0.0000004 |
| Ireland | Panama | 0.0000004 |
| Ireland | South Korea | 0.0000004 |
| Ireland | Taiwan | 0.0000004 |
| Uzbekistan | Sweden | 0.0000004 |
| Netherlands | Argentina | 0.0000004 |
| Nepal | Canada | 0.0000004 |
| Antarctica | Israel | 0.0000003 |
| China | Bulgaria | 0.0000003 |
| Mexico | South Africa | 0.0000003 |
| Spain | India | 0.0000003 |
| Germany | India | 0.0000003 |
| China | Iran | 0.0000003 |
| Denmark | Iran | 0.0000003 |
| Nepal | Iran | 0.0000003 |
| United Arab Emirates | Italy | 0.0000003 |
| United Kingdom | Mexico | 0.0000003 |
| Slovenia | Czech Republic | 0.0000003 |
| Japan | Bulgaria | 0.0000003 |
| Kosovo | Ireland | 0.0000003 |
| Ireland | Ireland | 0.0000003 |
| Portugal | Poland | 0.0000003 |
| Uganda | Colombia | 0.0000003 |
| Italy | Ireland | 0.0000003 |
| Nigeria | Czech Republic | 0.0000003 |
| Bulgaria | Turkey | 0.0000003 |
| Japan | India | 0.0000003 |
| Nepal | Australia | 0.0000003 |
| Switzerland | Bulgaria | 0.0000003 |
| China | Japan | 0.0000003 |
| Antarctica | Croatia | 0.0000003 |
| United Kingdom | Croatia | 0.0000003 |
| Nepal | Sweden | 0.0000002 |
| Poland | Spain | 0.0000002 |
| Denmark | Japan | 0.0000002 |
| Denmark | Portugal | 0.0000002 |
| Costa Rica | Canada | 0.0000002 |
| Ireland | Belgium | 0.0000002 |
| Mexico | Colombia | 0.0000002 |
| New Zealand | Bulgaria | 0.0000002 |
| Peru | United States | 0.0000002 |
| Argentina | Colombia | 0.0000002 |
| Belgium | Iran | 0.0000002 |
| Brazil | Iran | 0.0000002 |
| Costa Rica | Iran | 0.0000002 |
| Israel | Iran | 0.0000002 |
| Kenya | Colombia | 0.0000002 |
| Unknown | Mexico | 0.0000002 |
| Denmark | Hungary | 0.0000002 |
| Spain | Hungary | 0.0000002 |
| Ireland | Switzerland | 0.0000002 |
| Australia | India | 0.0000002 |
| Uzbekistan | France | 0.0000002 |
| India | Sweden | 0.0000002 |
| Canada | Japan | 0.0000002 |
| Canada | Portugal | 0.0000002 |
| Brazil | Saudi Arabia | 0.0000002 |
| Belgium | South Africa | 0.0000002 |
| France | South Africa | 0.0000002 |
| Japan | South Africa | 0.0000002 |
| Italy | Taiwan | 0.0000002 |
| South Korea | Austria | 0.0000002 |
| Switzerland | Colombia | 0.0000002 |
| Costa Rica | Sweden | 0.0000002 |
| Greenland | Colombia | 0.0000002 |
| Unknown | Latvia | 0.0000002 |
| Japan | Japan | 0.0000001 |
| Netherlands | Bulgaria | 0.0000001 |
| Iceland | Netherlands | 0.0000001 |
| Finland | Bulgaria | 0.0000001 |
| Spain | Colombia | 0.0000001 |
| Ireland | Czech Republic | 0.0000001 |
| Sweden | Bulgaria | 0.0000001 |
| United Kingdom | Turkey | 0.0000001 |
| Luxembourg | Sweden | 0.0000001 |
| Belgium | Bulgaria | 0.0000001 |
| United States | Tunisia | 0.0000001 |
| Portugal | Italy | 0.0000001 |
| Chile | Turkey | 0.0000001 |
| United States | Mexico | 0.0000001 |
| Switzerland | Mexico | 0.0000001 |
| Latvia | Sweden | 0.0000001 |
| Germany | Turkey | 0.0000001 |
| Taiwan | Bulgaria | 0.0000001 |
| Portugal | Hungary | 0.0000001 |
| Australia | Turkey | 0.0000001 |
| Sweden | Colombia | 0.0000001 |
| New Zealand | Turkey | 0.0000001 |
| Russia | United States | 0.0000001 |
| Slovenia | Germany | 0.0000001 |
| Poland | Colombia | 0.0000001 |
| Nepal | Denmark | 0.0000001 |
| China | Singapore | 0.0000001 |
| Netherlands | Hungary | 0.0000001 |
| United Kingdom | Finland | 0.0000001 |
| Bulgaria | India | 0.0000001 |
| Switzerland | Hungary | 0.0000001 |
| Luxembourg | France | 0.0000001 |
| United States | Slovakia | 0.0000001 |
| Japan | Portugal | 0.0000001 |
| Norway | Turkey | 0.0000001 |
| Colombia | Turkey | 0.0000001 |
| Australia | Vietnam | 0.0000001 |
| Finland | Vietnam | 0.0000001 |
| United Kingdom | Vietnam | 0.0000001 |
| Latvia | France | 0.0000001 |
| Portugal | Luxembourg | 0.0000001 |
| Ukraine | Turkey | 0.0000001 |
| Portugal | India | 0.0000000 |
| Costa Rica | Denmark | 0.0000000 |
| South Korea | United States | 0.0000000 |
| Netherlands | Luxembourg | 0.0000000 |
| Germany | Hungary | 0.0000000 |
| Uzbekistan | Colombia | 0.0000000 |
| Hungary | France | 0.0000000 |
| Egypt | Sweden | 0.0000000 |
| Netherlands | India | 0.0000000 |
| Colombia | India | 0.0000000 |
| Switzerland | Luxembourg | 0.0000000 |
| Switzerland | India | 0.0000000 |
| China | Israel | 0.0000000 |
| Hungary | Italy | 0.0000000 |
| Italy | Austria | 0.0000000 |
| Hungary | United Kingdom | 0.0000000 |
| Finland | Turkey | 0.0000000 |
| Uganda | Turkey | 0.0000000 |
| Germany | Luxembourg | 0.0000000 |
| Mexico | Turkey | 0.0000000 |
| Argentina | Turkey | 0.0000000 |
| Netherlands | Portugal | 0.0000000 |
| Kenya | Turkey | 0.0000000 |
| Japan | Singapore | 0.0000000 |
| Australia | Finland | 0.0000000 |
| Bulgaria | Argentina | 0.0000000 |
| Egypt | France | 0.0000000 |
| Nigeria | Sweden | 0.0000000 |
| Switzerland | Turkey | 0.0000000 |
| Greenland | Turkey | 0.0000000 |
| Greece | United Kingdom | 0.0000000 |
| Iceland | United Kingdom | 0.0000000 |
| Singapore | United Kingdom | 0.0000000 |
| Spain | Turkey | 0.0000000 |
| Luxembourg | Colombia | 0.0000000 |
| Brazil | Mexico | 0.0000000 |
| United States | Sri Lanka | 0.0000000 |
| Czech Republic | Colombia | 0.0000000 |
| Latvia | Colombia | 0.0000000 |
| Nigeria | France | 0.0000000 |
| Ireland | Sweden | 0.0000000 |
| Portugal | Tunisia | 0.0000000 |
| Sweden | Turkey | 0.0000000 |
| Chile | Argentina | 0.0000000 |
| Bulgaria | Finland | 0.0000000 |
| Poland | Turkey | 0.0000000 |
| Netherlands | Tunisia | 0.0000000 |
| Switzerland | Tunisia | 0.0000000 |
| Germany | Argentina | 0.0000000 |
| France | India | 0.0000000 |
| United Kingdom | Japan | 0.0000000 |
| Nepal | Poland | 0.0000000 |
| Australia | Japan | 0.0000000 |
| South Africa | Colombia | 0.0000000 |
| Australia | Argentina | 0.0000000 |
| Finland | Japan | 0.0000000 |
| Brazil | Turkey | 0.0000000 |
| Austria | Finland | 0.0000000 |
| New Zealand | Argentina | 0.0000000 |
| Hungary | Switzerland | 0.0000000 |
| France | Japan | 0.0000000 |
| France | Portugal | 0.0000000 |
| Hong Kong | United Kingdom | 0.0000000 |
| Canada | India | 0.0000000 |
| Germany | Tunisia | 0.0000000 |
| Portugal | Czech Republic | 0.0000000 |
| Portugal | Slovakia | 0.0000000 |
| Uzbekistan | Turkey | 0.0000000 |
| United Arab Emirates | United Kingdom | 0.0000000 |
| Costa Rica | Poland | 0.0000000 |
| Norway | Argentina | 0.0000000 |
| Egypt | Colombia | 0.0000000 |
| Colombia | Argentina | 0.0000000 |
| Peru | Netherlands | 0.0000000 |
| Netherlands | Slovakia | 0.0000000 |
| China | Hungary | 0.0000000 |
| Ukraine | Argentina | 0.0000000 |
| United States | Latvia | 0.0000000 |
| Switzerland | Slovakia | 0.0000000 |
| Chile | Finland | 0.0000000 |
| Hong Kong | Canada | 0.0000000 |
| India | Bulgaria | 0.0000000 |
| Greece | Austria | 0.0000000 |
| Chile | Bulgaria | 0.0000000 |
| Nepal | Italy | 0.0000000 |
| China | India | 0.0000000 |
| Finland | Argentina | 0.0000000 |
| China | Luxembourg | 0.0000000 |
| Uganda | Argentina | 0.0000000 |
| Germany | Slovakia | 0.0000000 |
| New Zealand | Finland | 0.0000000 |
| Hungary | Poland | 0.0000000 |
| United Arab Emirates | Netherlands | 0.0000000 |
| Belgium | Colombia | 0.0000000 |
| Israel | Colombia | 0.0000000 |
| Nigeria | Colombia | 0.0000000 |
| Mexico | Argentina | 0.0000000 |
| China | Turkey | 0.0000000 |
| Argentina | Argentina | 0.0000000 |
| Kenya | Argentina | 0.0000000 |
| Austria | Turkey | 0.0000000 |
| Costa Rica | Italy | 0.0000000 |
| Norway | Finland | 0.0000000 |
| Israel | Bulgaria | 0.0000000 |
| Colombia | Finland | 0.0000000 |
| Norway | Bulgaria | 0.0000000 |
| Ukraine | Finland | 0.0000000 |
| Switzerland | Argentina | 0.0000000 |
| Luxembourg | Turkey | 0.0000000 |
| Niue | United States | 0.0000000 |
| Greenland | Argentina | 0.0000000 |
| Argentina | Bulgaria | 0.0000000 |
| Japan | Hungary | 0.0000000 |
| Bulgaria | Japan | 0.0000000 |
| Bulgaria | Portugal | 0.0000000 |
| Ukraine | Bulgaria | 0.0000000 |
| Spain | Argentina | 0.0000000 |
| Czech Republic | Turkey | 0.0000000 |
| Latvia | Turkey | 0.0000000 |
| Ireland | Colombia | 0.0000000 |
| British Virgin Islands | Netherlands | 0.0000000 |
| Uganda | Finland | 0.0000000 |
| United Kingdom | India | 0.0000000 |
| Portugal | Sri Lanka | 0.0000000 |
| Antarctica | India | 0.0000000 |
| Hong Kong | Austria | 0.0000000 |
| Japan | Luxembourg | 0.0000000 |
| Uganda | Bulgaria | 0.0000000 |
| Mexico | Finland | 0.0000000 |
| Netherlands | Sri Lanka | 0.0000000 |
| Argentina | Finland | 0.0000000 |
| Costa Rica | Finland | 0.0000000 |
| Kenya | Finland | 0.0000000 |
| South Africa | Turkey | 0.0000000 |
| Switzerland | Sri Lanka | 0.0000000 |
| Sweden | Argentina | 0.0000000 |
| Mexico | Bulgaria | 0.0000000 |
| Uganda | Spain | 0.0000000 |
| United Kingdom | Portugal | 0.0000000 |
| Poland | Argentina | 0.0000000 |
| Kenya | Bulgaria | 0.0000000 |
| Slovenia | Sweden | 0.0000000 |
| Argentina | Spain | 0.0000000 |
| Greenland | Finland | 0.0000000 |
| Chile | Japan | 0.0000000 |
| Chile | Lithuania | 0.0000000 |
| Chile | Portugal | 0.0000000 |
| Mexico | Spain | 0.0000000 |
| Spain | Finland | 0.0000000 |
| Kenya | Spain | 0.0000000 |
| Brazil | Argentina | 0.0000000 |
| Egypt | Turkey | 0.0000000 |
| Germany | Sri Lanka | 0.0000000 |
| Germany | Portugal | 0.0000000 |
| Greenland | Bulgaria | 0.0000000 |
| China | Tunisia | 0.0000000 |
| Australia | Lithuania | 0.0000000 |
| Australia | Portugal | 0.0000000 |
| New Zealand | Japan | 0.0000000 |
| New Zealand | Portugal | 0.0000000 |
| Portugal | Russia | 0.0000000 |
| Israel | Finland | 0.0000000 |
| Greenland | Spain | 0.0000000 |
| Uzbekistan | Argentina | 0.0000000 |
| Netherlands | Russia | 0.0000000 |
| Portugal | Latvia | 0.0000000 |
| Switzerland | Russia | 0.0000000 |
| Norway | Japan | 0.0000000 |
| Norway | Lithuania | 0.0000000 |
| Norway | Portugal | 0.0000000 |
| Sweden | Finland | 0.0000000 |
| Colombia | Japan | 0.0000000 |
| Colombia | Portugal | 0.0000000 |
| Netherlands | Latvia | 0.0000000 |
| Ukraine | Japan | 0.0000000 |
| Ukraine | Lithuania | 0.0000000 |
| Ukraine | Portugal | 0.0000000 |
| Hungary | Netherlands | 0.0000000 |
| Poland | Finland | 0.0000000 |
| Switzerland | Latvia | 0.0000000 |
| Belgium | Turkey | 0.0000000 |
| Israel | Turkey | 0.0000000 |
| Nigeria | Turkey | 0.0000000 |
| China | Slovakia | 0.0000000 |
| Portugal | Ireland | 0.0000000 |
| Brazil | Finland | 0.0000000 |
| Austria | Bulgaria | 0.0000000 |
| Germany | Latvia | 0.0000000 |
| Brazil | Bulgaria | 0.0000000 |
| Finland | Lithuania | 0.0000000 |
| Japan | Tunisia | 0.0000000 |
| Uganda | Japan | 0.0000000 |
| Uganda | Lithuania | 0.0000000 |
| Uganda | Portugal | 0.0000000 |
| Slovenia | Bulgaria | 0.0000000 |
| Uzbekistan | Finland | 0.0000000 |
| Mexico | Japan | 0.0000000 |
| Mexico | Lithuania | 0.0000000 |
| Mexico | Portugal | 0.0000000 |
| Argentina | Japan | 0.0000000 |
| Argentina | Lithuania | 0.0000000 |
| Argentina | Portugal | 0.0000000 |
| Kenya | Japan | 0.0000000 |
| Kenya | Lithuania | 0.0000000 |
| Kenya | Portugal | 0.0000000 |
| Uzbekistan | Bulgaria | 0.0000000 |
| Ireland | Turkey | 0.0000000 |
| Luxembourg | Argentina | 0.0000000 |
| Switzerland | Lithuania | 0.0000000 |
| Switzerland | Portugal | 0.0000000 |
| Uzbekistan | Spain | 0.0000000 |
| Czech Republic | Argentina | 0.0000000 |
| Latvia | Argentina | 0.0000000 |
| Australia | Hungary | 0.0000000 |
| Finland | Hungary | 0.0000000 |
| United Kingdom | Hungary | 0.0000000 |
| Greenland | Japan | 0.0000000 |
| Greenland | Lithuania | 0.0000000 |
| Greenland | Portugal | 0.0000000 |
| Spain | Japan | 0.0000000 |
| Spain | Portugal | 0.0000000 |
| Japan | Slovakia | 0.0000000 |
| Portugal | Mexico | 0.0000000 |
| Netherlands | Mexico | 0.0000000 |
| Australia | Luxembourg | 0.0000000 |
| Finland | Luxembourg | 0.0000000 |
| United Kingdom | Luxembourg | 0.0000000 |
| South Africa | Argentina | 0.0000000 |
| Finland | India | 0.0000000 |
| Hong Kong | Switzerland | 0.0000000 |
| Luxembourg | Finland | 0.0000000 |
| Sweden | Japan | 0.0000000 |
| Sweden | Lithuania | 0.0000000 |
| Sweden | Portugal | 0.0000000 |
| China | Sri Lanka | 0.0000000 |
| Poland | Japan | 0.0000000 |
| Poland | Lithuania | 0.0000000 |
| Poland | Portugal | 0.0000000 |
| Luxembourg | Bulgaria | 0.0000000 |
| Czech Republic | Finland | 0.0000000 |
| Egypt | Argentina | 0.0000000 |
| Latvia | Finland | 0.0000000 |
| Brazil | Japan | 0.0000000 |
| Brazil | Lithuania | 0.0000000 |
| Czech Republic | Bulgaria | 0.0000000 |
| Latvia | Bulgaria | 0.0000000 |
| Luxembourg | Spain | 0.0000000 |
| Czech Republic | Spain | 0.0000000 |
| Latvia | Spain | 0.0000000 |
| Uzbekistan | Japan | 0.0000000 |
| Uzbekistan | Lithuania | 0.0000000 |
| Uzbekistan | Portugal | 0.0000000 |
| Portugal | Uruguay | 0.0000000 |
| South Africa | Finland | 0.0000000 |
| China | Russia | 0.0000000 |
| South Africa | Bulgaria | 0.0000000 |
| Belgium | Argentina | 0.0000000 |
| Israel | Argentina | 0.0000000 |
| Nigeria | Argentina | 0.0000000 |
| Japan | Sri Lanka | 0.0000000 |
| China | Latvia | 0.0000000 |
| Egypt | Finland | 0.0000000 |
| Slovenia | Spain | 0.0000000 |
| Egypt | Bulgaria | 0.0000000 |
| Australia | Tunisia | 0.0000000 |
| Finland | Tunisia | 0.0000000 |
| United Kingdom | Tunisia | 0.0000000 |
| China | Portugal | 0.0000000 |
| Egypt | Spain | 0.0000000 |
| Ireland | Argentina | 0.0000000 |
| Austria | Japan | 0.0000000 |
| Austria | Lithuania | 0.0000000 |
| Austria | Portugal | 0.0000000 |
| Luxembourg | Japan | 0.0000000 |
| Luxembourg | Lithuania | 0.0000000 |
| Luxembourg | Portugal | 0.0000000 |
| Belgium | Finland | 0.0000000 |
| Nigeria | Finland | 0.0000000 |
| Japan | Russia | 0.0000000 |
| Czech Republic | Japan | 0.0000000 |
| Czech Republic | Lithuania | 0.0000000 |
| Czech Republic | Portugal | 0.0000000 |
| Latvia | Japan | 0.0000000 |
| Latvia | Lithuania | 0.0000000 |
| Latvia | Portugal | 0.0000000 |
| Nigeria | Bulgaria | 0.0000000 |
| Australia | Slovakia | 0.0000000 |
| Finland | Slovakia | 0.0000000 |
| United Kingdom | Slovakia | 0.0000000 |
| Japan | Latvia | 0.0000000 |
| Nigeria | Spain | 0.0000000 |
| South Africa | Japan | 0.0000000 |
| South Africa | Lithuania | 0.0000000 |
| South Africa | Portugal | 0.0000000 |
| Ireland | Finland | 0.0000000 |
| China | Mexico | 0.0000000 |
| Ireland | Bulgaria | 0.0000000 |
| Egypt | Japan | 0.0000000 |
| Egypt | Lithuania | 0.0000000 |
| Egypt | Portugal | 0.0000000 |
| Belgium | Japan | 0.0000000 |
| Belgium | Lithuania | 0.0000000 |
| Belgium | Portugal | 0.0000000 |
| Israel | Japan | 0.0000000 |
| Israel | Lithuania | 0.0000000 |
| Israel | Portugal | 0.0000000 |
| Nigeria | Japan | 0.0000000 |
| Nigeria | Lithuania | 0.0000000 |
| Nigeria | Portugal | 0.0000000 |
| Japan | Mexico | 0.0000000 |
| Australia | Sri Lanka | 0.0000000 |
| Finland | Sri Lanka | 0.0000000 |
| United Kingdom | Sri Lanka | 0.0000000 |
| Ireland | Japan | 0.0000000 |
| Ireland | Lithuania | 0.0000000 |
| Ireland | Portugal | 0.0000000 |
| Finland | Russia | 0.0000000 |
| Australia | Latvia | 0.0000000 |
| Finland | Latvia | 0.0000000 |
| United Kingdom | Latvia | 0.0000000 |
| Australia | Mexico | 0.0000000 |
| Finland | Mexico | 0.0000000 |
| France | Nigeria | 0.0000000 |
dependency_summary_noUnknown <- dependency_summary %>%
filter(Cited_Country != "Unknown" & Citing_Country != "Unknown")%>%
arrange(desc(Total_Dependency_Fraction))
dependency_summary <- dependency_summary %>%
arrange(desc(Total_Dependency_Fraction))### select dependency information for slugs and packages
cran_github_rdi <- cran_github %>%
select(Package, slug, Depends)
### rename columns
colnames(cran_github_rdi) <- c("Citing_Package", "slug", "Dependencies")
### Package citation column will be the unlisted dependencies column
cran_github_rdi$Package_Citation <- cran_github_rdi$Dependencies
### join commits information for the citing packages
cran_github_RDI <- cran_github_rdi %>%
inner_join(user_commits_total, by = "slug")%>%
select(Citing_Package, slug, Dependencies, login,
sector, total_additions, total_code_for_slug,
contribution_fraction_loc, Package_Citation) %>%
# Remove rows with NA in Depends
filter(!is.na(Package_Citation))
### rename columns on the basis of the citing package
colnames(cran_github_RDI) <- c("Citing_Package", "Citing_Slug", "Dependencies", "Citing_Login", "Citing_Sector",
"Citing_Additions", "Citing_Total_Slug_Additions", "Citing_Package_Fraction" , "Package_Citation")
### unlist the dependencies for joining
cran_github_RDI_network <- cran_github_RDI %>%
separate_rows(Package_Citation, sep = ",\\s*") %>%
filter(Package_Citation != "")
#### prepare commits information for cited packages
user_commits_rdi <- user_commits_total %>%
mutate(Package_Citation = str_split(slug, "/", simplify = TRUE)[, 2])%>%
select(login, sector, total_additions, total_code_for_slug, contribution_fraction_loc, Package_Citation)
colnames(user_commits_rdi) <- c( "Cited_Login", "Cited_Sector",
"Cited_Additions", "Cited_Total_Slug_Additions", "Cited_Package_Fraction", "Package_Citation")
### join cited package commit information to citing package dataframe
cran_github_rdi_full <- cran_github_RDI_network %>%
inner_join(user_commits_rdi, by = "Package_Citation")
### create dependency_fraction = citing package fraction multiplied by cited package fraction
cran_github_rdi_grouped <- cran_github_rdi_full %>%
mutate(Dependency_Fraction = Citing_Package_Fraction * Cited_Package_Fraction)# Group by Cited Country and Citing Country, and sum Dependency_Fraction
### the number of citations made from one country to another is simply the sum of the fractioned scores associated with each pair, with the sum across all possible pairs adding up to the total number of citations made at the world level.
dependency_summary <- cran_github_rdi_grouped %>%
group_by(Cited_Sector, Citing_Sector) %>%
summarize(Total_Dependency_Fraction = sum(Dependency_Fraction, na.rm = TRUE))
sum(dependency_summary$Total_Dependency_Fraction)[1] 589
# Group by Cited Country and sum Total_Dependency_Fraction - total number of citations attributed to each country
citations_by_sector <- dependency_summary %>%
group_by(Cited_Sector) %>%
summarize(Fraction_of_Citations = round(sum(Total_Dependency_Fraction, na.rm = TRUE), 4))
sum(citations_by_sector$Fraction_of_Citations)[1] 589
citations_by_sector$Denominator_RDI <- round(citations_by_sector$Fraction_of_Citations / sum(citations_by_sector$Fraction_of_Citations),4)
# Group by citing country and sum Total_Dependency_Fraction - total number of citations made by each country
citings_by_sector <- dependency_summary %>%
group_by(Citing_Sector) %>%
summarize(Fraction_of_Citings = round(sum(Total_Dependency_Fraction, na.rm = TRUE), 4))
sum(citings_by_sector$Fraction_of_Citings)[1] 588.9999
# join citings by country with dependency_summary
citings_dependency_summary <- citings_by_sector %>%
full_join(dependency_summary, by = "Citing_Sector")
citings_dependency_summary$Numerator_RDI <- round(citings_dependency_summary$Total_Dependency_Fraction / citings_dependency_summary$Fraction_of_Citings,4)
## join denominator_RDI
citations_citings_dependency_summary <- citations_by_sector %>%
full_join(citings_dependency_summary, by = "Cited_Sector") %>%
select(Citing_Sector, Cited_Sector, Numerator_RDI, Denominator_RDI)
citations_citings_dependency_summary$RDI <- round(citations_citings_dependency_summary$Numerator_RDI / citations_citings_dependency_summary$Denominator_RDI,4)# Calculate the total of Fraction_of_Citations, including "Unknown"
total_citations_incl_unknown <- sum(citations_by_country$Fraction_of_Citations)
# Create and round the percentage column to the nearest hundredth, including "Unknown" in the percentage calculation
citations_by_country$Percentage_of_Citations <- round(
(citations_by_country$Fraction_of_Citations / total_citations_incl_unknown) * 100, 2
)
# Arrange by descending order of the new percentage column
citations_by_country %>%
arrange(desc(Percentage_of_Citations))# A tibble: 70 × 4
Cited_Country Fraction_of_Citations Denominator_RDI Percentage_of_Citations
<chr> <dbl> <dbl> <dbl>
1 Unknown 200. 0.340 34.0
2 United States 172. 0.292 29.2
3 Germany 41.3 0.0702 7.02
4 France 30.6 0.0519 5.19
5 Denmark 23.1 0.0393 3.93
6 Canada 18.6 0.0316 3.16
7 Norway 14.9 0.0252 2.52
8 Netherlands 12.4 0.0211 2.11
9 Bulgaria 11.2 0.019 1.9
10 Australia 9.99 0.017 1.7
# ℹ 60 more rows
# Calculate the total of Fraction_of_Citations, excluding "Unknown"
total_citations <- sum(citations_by_country$Fraction_of_Citations[citations_by_country$Cited_Country != "Unknown"])
# Create and round the percentage column to the nearest hundredth, excluding "Unknown" in the percentage calculation
citations_by_country$Percentage_of_Citations <- ifelse(
citations_by_country$Cited_Country == "Unknown",
NA,
round((citations_by_country$Fraction_of_Citations / total_citations) * 100, 2)
)
citations_by_country %>%
arrange(desc(Percentage_of_Citations))# A tibble: 70 × 4
Cited_Country Fraction_of_Citations Denominator_RDI Percentage_of_Citations
<chr> <dbl> <dbl> <dbl>
1 United States 172. 0.292 44.2
2 Germany 41.3 0.0702 10.6
3 France 30.6 0.0519 7.87
4 Denmark 23.1 0.0393 5.95
5 Canada 18.6 0.0316 4.79
6 Norway 14.9 0.0252 3.83
7 Netherlands 12.4 0.0211 3.19
8 Bulgaria 11.2 0.019 2.88
9 Australia 9.99 0.017 2.57
10 United Kingdom 9.65 0.0164 2.48
# ℹ 60 more rows
The following graph shows a countries’ lines of code credit compared to the percentage of citations they have from other countries (or reverse dependencies in package labguage)
data <- data.frame(
Country = c("United States", "Germany", "United Kingdom", "France", "Canada",
"Australia", "Netherlands", "Switzerland", "Spain", "China",
"United States", "Germany", "United Kingdom", "France", "Canada",
"Australia", "Netherlands", "Switzerland", "Spain", "China"),
Measure = c("Package %", "Package %", "Package %", "Package %", "Package %",
"Package %", "Package %", "Package %", "Package %", "Package %",
"Reverse Dependency %", "Reverse Dependency %", "Reverse Dependency %", "Reverse Dependency %", "Reverse Dependency %",
"Reverse Dependency %", "Reverse Dependency %", "Reverse Dependency %", "Reverse Dependency %", "Reverse Dependency %"),
Value = c( -30.9, -10.6, -7.6, -5.9, -5.3, -4.7, -3.6, -2.7, -2.6, -2.2,
44.2, 10.6, 2.48, 7.9, 4.8, 2.6, 3.2, 1, 1.4, .3) # Negative for Dependency %, positive for Code%
)
# Filter data to include only Code % values for ordering
code_values <- data %>%
filter(Measure == "Package %") %>%
arrange(desc(Value))
# Reorder Country factor based on Code % values
data$Country <- factor(data$Country, levels = code_values$Country)
# Create the plot with value labels and ordered countries
# Create the plot with value labels and ordered countries with increased text size
plot <- ggplot(data, aes(x = Country, y = Value, fill = Measure)) +
geom_bar(stat = "identity", position = "identity") +
coord_flip() +
scale_y_continuous(labels = abs, breaks = seq(-50, 50, by = 10), limits = c(-50, 55)) +
labs(y = "Percentage", x = "", title = "R") +
theme_minimal() +
scale_fill_manual(values = c("Package %" = "darkblue", "Reverse Dependency %" = "lightblue")) + # Add your own colors
theme(
text = element_text(size = 14), # Changes global text size
axis.title = element_text(size = 16), # Changes axis title text size
axis.text = element_text(size = 12), # Changes axis text size
plot.title = element_text(size = 12, face = "bold", hjust = .5) # Changes plot title text size and makes it bold
)+
geom_text(data = subset(data, Value > 0), aes(label = sprintf("%0.1f%%", Value)),
position = position_nudge(y = 0.5), hjust = 0, size = 3.5) +
geom_text(data = subset(data, Value < 0), aes(label = sprintf("%0.1f%%", abs(Value))),
position = position_nudge(y = -0.5), hjust = 1, size = 3.5)+
theme(legend.position = "bottom")
# Display the plot
print(plot)Who are the key players (projects, developers, institutions, sectors, and countries) on the networks and how has this changed over time?
How do the positions of OSS actors impact OSS contributions?
We take a look at the distributions of some of the impact measures by the sectors to see if certain sectors have packages of more impact.
It looks like the business sector has packages with the highest all-time downloads on average. This is looking at the log of the downloads for visual purposes.
## Show distribution of downloads by Institution
ggplot(cran_repos, aes(x = Sector, y = log(Downloads_All_Time), fill = Sector))+
geom_boxplot()+
ggtitle("All-Time Downloads Distribution by Sector (GitHub R Packages)")+
ylab("Log of All-Time Downloads")+
theme_gdocs()+
theme(plot.title = element_text(size = 13))+
coord_flip()+
xlab("")+
scale_fill_westat(option = "BLUES", drop = FALSE)The same is true of normalized downloads as well… These are probably packages from Rstudio.
## Show distribution of downloads by Institution
ggplot(cran_repos, aes(x = Sector, y = log(Downloads_Normalized), fill = Sector))+
geom_boxplot()+
ggtitle("Normalized Downloads Distribution by Sector (GitHub R Packages)")+
ylab("Log of Normalized Downloads")+
theme_gdocs()+
theme(plot.title = element_text(size = 13))+
coord_flip()+
scale_fill_westat(option = "BLUES", drop = FALSE)+
xlab("")+
labs(caption = "*Data points represent individual packages")For reverse dependencies, most sectors are at zero on average aside from government and business, which are about 1 on average. Again, this looks at the log of reverse dependencies, so this really means about 10 reverse dependencies on average. There are a lot of observations for Unknown sector that are at the higher end.
## Show distribution of downloads by Institution
ggplot(cran_repos, aes(x = Sector, y = log(Reverse_Depends_Count), fill = Sector))+
geom_boxplot()+
ggtitle("Reverse Dependencies Distribution by Sector (GitHub R Packages)")+
ylab("Log of Reverse Dependencies")+
theme_gdocs()+
theme(plot.title = element_text(size = 13))+
coord_flip()+
scale_fill_westat(option = "BLUES", drop = FALSE)+
xlab("")Business has the highest log of stars on average, followed by government. We know from the EDA that stars and downloads have a moderate correlation so this makes sense.
## Show distribution of downloads by Institution
ggplot(cran_repos, aes(x = Sector, y = log(stargazer_count), fill = Sector))+
geom_boxplot()+
ggtitle("Star Count Distribution by Sector (GitHub R Packages)")+
ylab("Log of Star Count")+
theme_gdocs()+
theme(plot.title = element_text(size = 13))+
coord_flip()+
scale_fill_westat(option = "BLUES", drop = FALSE)+
xlab("")+
labs(caption = "*Data points represent individual packages")Business has the highest log of forks on average as well, followed by government again. We know from the EDA that stars and forks have a very high correlation so this makes sense too.
## Show distribution of downloads by Institution
ggplot(cran_repos, aes(x = Sector, y = log(fork_count), fill = Sector))+
geom_boxplot()+
ggtitle("Fork Count Distribution by Sector (GitHub R Packages)")+
ylab("Log of Fork Count")+
theme_gdocs()+
theme(plot.title = element_text(size = 13))+
coord_flip()+
scale_fill_westat(option = "BLUES", drop = FALSE)+
xlab("")+
labs(caption = "*Data points represent individual packages")Of the top institutions/organizations we found on GitHub, we look at the distribution of downloads for these packages.
### Filter for names of the top 5 institutions
Top_Institution_repos <- cran_repos %>%
filter(Institution %in% top10_Institutions_GitHub$Institution)%>%
filter(!is.na(Downloads_All_Time))
Top_Institution_repos <- cran_repos %>%
filter(Institution %in% top10_Institutions_GitHub$Institution)%>%
filter(!is.na(Downloads_All_Time))RStudio has the highest log of all-time downloads on average of the top 10 institutions on GitHub.
ggplot(Top_Institution_repos, aes(x = Institution, y = log(Downloads_All_Time), fill = Sector))+
geom_boxplot()+
ggtitle("All-Time Downloads Distribution by Institution - Top 10")+
ylab("Log of All-Time Downloads")+
ylim(0,20)+
theme_gdocs()+
theme(plot.title = element_text(size = 15))+
coord_flip()+
scale_fill_westat(option = "BLUES", drop = FALSE)The distributions are very similar for normalized downloads as well.
ggplot(Top_Institution_repos, aes(x = Institution, y = log(Downloads_Normalized), fill = Sector))+
geom_boxplot()+
ggtitle("Normalized Downloads Distribution by Institution - Top 10")+
ylab("Log of Normalized Downloads")+
ylim(0,20)+
theme_gdocs()+
theme(plot.title = element_text(size = 15))+
coord_flip()+
scale_fill_westat(option = "BLUES", drop = FALSE)For reverse dependencies, the log averages are essentially all zero. There are a few that are above this mark, with RStudio having the most observations at the higher end.
ggplot(Top_Institution_repos, aes(x = Institution, y = log(Reverse_Depends_Count), fill = Sector))+
geom_boxplot()+
ggtitle("Reverse Dependencies Distribution by Institution - Top 10")+
ylab("Log of Reverse Dependencies")+
ylim(0,20)+
theme_gdocs()+
theme(plot.title = element_text(size = 15))+
coord_flip()+
scale_fill_westat(option = "BLUES", drop = FALSE)Star count is led by Rstudio as well…UCLA appears to be the next highest on average.
ggplot(Top_Institution_repos, aes(x = Institution, y = log(stargazer_count), fill = Sector))+
geom_boxplot()+
ggtitle("Star Count Distribution by Institution - Top 10")+
ylab("Log of Star Count")+
ylim(0,20)+
theme_gdocs()+
theme(plot.title = element_text(size = 15))+
coord_flip()+
scale_fill_westat(option = "BLUES", drop = FALSE)The same that is true of stars is mostly true for forks as well
ggplot(Top_Institution_repos, aes(x = Institution, y = log(fork_count), fill = Sector))+
geom_boxplot()+
ggtitle("Fork Count Distribution by Institution - Top 10")+
ylab("Log of Fork Count")+
ylim(0,20)+
theme_gdocs()+
theme(plot.title = element_text(size = 15))+
coord_flip()+
scale_fill_westat(option = "BLUES", drop = FALSE)The following code is some analysis for a working paper… results are not commented on at the moment, but we want to see if larger teams have more impact on average.
test removing extreme team size outliers, add reverse dependencies, normalized and non-normalized
Literature on team size and citations - do we see same thing?
## Create team size
user_commits_total <- user_commits_total %>%
group_by(slug) %>%
mutate(team_size = n()) %>%
ungroup()
## normalize stars and forks based on year_created
user_commits_total <- user_commits_total %>%
mutate(year_created = as.numeric(year_created))
user_commits_total <- user_commits_total %>%
mutate(
normalization_factor = ifelse(is.na(year_created), NA, 2023 - year_created + 1),
stargazer_count_normalized = ifelse(is.na(stargazer_count) | is.na(normalization_factor), stargazer_count, stargazer_count / normalization_factor),
fork_count_normalized = ifelse(is.na(fork_count) | is.na(normalization_factor), fork_count, fork_count / normalization_factor),
reverse_dep_normalized = ifelse(is.na(Reverse_Depends_Count) | is.na(normalization_factor), Reverse_Depends_Count, Reverse_Depends_Count / normalization_factor)
) user_commits_distinct <- user_commits_total %>%
distinct(slug, .keep_all = TRUE)
quantile(user_commits_distinct$team_size, probs = seq(0, 1, .1)) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
1 1 1 1 2 2 3 3 5 8 880
# Define the bins based on the updated percentiles
# Define the bins based on the updated percentiles
bins <- c(1, 2, 3, 5, 8, 880)
# Create labels for the bins
bin_labels <- c("[1]", "[2]", "[3-4]", "[5-7]", "[8-880]")
# Create a new column with binned team sizes and custom labels
user_commits_distinct <- user_commits_distinct %>%
mutate(team_size_bin = cut(team_size, breaks = bins, labels = bin_labels, include.lowest = TRUE, right = FALSE))
## convert
table(user_commits_distinct$team_size_bin)
[1] [2] [3-4] [5-7] [8-880]
2441 1804 1588 791 783
mean(user_commits_distinct$team_size)[1] 4.619009
# Calculate Q1, Q3, and IQR
Q1 <- quantile(user_commits_distinct$team_size, 0.25)
Q3 <- quantile(user_commits_distinct$team_size, 0.75)
IQR <- Q3 - Q1
# Define lower and upper bounds
lower_bound <- Q1 - 1.5 * IQR
upper_bound <- Q3 + 1.5 * IQR
# Filter out outliers
user_commits_no_outliers1 <- user_commits_distinct %>%
filter(team_size >= lower_bound & team_size <= upper_bound)
quantile(user_commits_no_outliers1$team_size, probs = seq(0, 1, .1)) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
1 1 1 1 2 2 2 3 4 5 8
# Define the bins based on the updated percentiles
bins_no_outliers1 <- c(1, 2, 3, 5, 8)
# Create labels for the bins
bin_labels_no_outliers1 <- c("[1]", "[2]", "[3-4]", "[5-8]")
# Create a new column with binned team sizes and custom labels
user_commits_no_outliers1 <- user_commits_no_outliers1 %>%
mutate(team_size_bin = cut(team_size, breaks = bins_no_outliers1, labels = bin_labels_no_outliers1, include.lowest = TRUE, right = FALSE))
# View the table of binned team sizes
table(user_commits_no_outliers1$team_size_bin)
[1] [2] [3-4] [5-8]
2441 1804 1588 909
# Calculate the mean and standard deviation of team_size
mean_team_size <- mean(user_commits_distinct$team_size)
sd_team_size <- sd(user_commits_distinct$team_size)
# Calculate Z-scores
user_commits_distinct <- user_commits_distinct %>%
mutate(z_score = (team_size - mean_team_size) / sd_team_size)
# Define a threshold for Z-scores (commonly 3 or 2.5)
z_threshold <- 3
# Filter out outliers based on Z-score
user_commits_no_outliers_z <- user_commits_distinct %>%
filter(abs(z_score) <= z_threshold)
quantile(user_commits_no_outliers_z$team_size, probs = seq(0, 1, .1)) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
1.0 1.0 1.0 1.0 2.0 2.0 3.0 3.0 5.0 7.1 58.0
# Define the bins based on the updated percentiles
bins_no_outliers_z <- c(1, 2, 3, 5, 8, 58)
# Create labels for the bins
bin_labels_no_outliers_z <- c("[1]", "[2]", "[3-4]", "[5-6]", "[7-58]")
# Create a new column with binned team sizes and custom labels
user_commits_no_outliers_z <- user_commits_no_outliers_z %>%
mutate(team_size_bin = cut(team_size, breaks = bins_no_outliers_z , labels = bin_labels_no_outliers_z, include.lowest = TRUE, right = FALSE))
# View the table of binned team sizes
table(user_commits_no_outliers_z$team_size_bin)
[1] [2] [3-4] [5-6] [7-58]
2441 1804 1588 791 736
# Define a custom theme
custom_theme <- theme_minimal() +
theme(
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 12),
axis.text.y = element_text(size = 12),
axis.title = element_text(size = 14, face = "bold"),
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
legend.title = element_text(size = 12),
legend.text = element_text(size = 10)
)
# Plot stargazer_count_normalized vs. team_size_bin
ggplot(user_commits_distinct, aes(x = team_size_bin, y = log(stargazer_count_normalized), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_westat(option = "BLUES", drop = FALSE) +
labs(x = "Team Size Bin", y = "Stargazer Count Normalized", fill = "Team Size") +
custom_theme# Plot fork_count_normalized vs. team_size_bin
ggplot(user_commits_distinct, aes(x = team_size_bin, y = log(fork_count_normalized), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set3") +
labs(x = "Team Size Bin", y = "Fork Count Normalized", fill = "Team Size") +
custom_theme# Plot Downloads_Normalized vs. team_size_bin
ggplot(user_commits_distinct, aes(x = team_size_bin, y = log(Downloads_Normalized), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set5") +
labs(x = "Team Size Bin", y = "Downloads Normalized", fill = "Team Size") +
custom_theme# Plot Downloads_Normalized vs. team_size_bin
ggplot(user_commits_distinct, aes(x = team_size_bin, y = log(reverse_dep_normalized), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set5") +
labs(x = "Team Size Bin", y = "Rev Dep Normalized", fill = "Team Size") +
custom_theme# Define a custom theme
custom_theme <- theme_minimal() +
theme(
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 12),
axis.text.y = element_text(size = 12),
axis.title = element_text(size = 14, face = "bold"),
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
legend.title = element_text(size = 12),
legend.text = element_text(size = 10)
)
# Plot stargazer_count_normalized vs. team_size_bin
ggplot(user_commits_distinct, aes(x = team_size_bin, y = log(stargazer_count), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_westat(option = "BLUES", drop = FALSE) +
labs(x = "Team Size Bin", y = "Stargazer Count", fill = "Team Size") +
custom_theme# Plot fork_count_normalized vs. team_size_bin
ggplot(user_commits_distinct, aes(x = team_size_bin, y = log(fork_count), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set3") +
labs(x = "Team Size Bin", y = "Fork Count", fill = "Team Size") +
custom_theme# Plot Downloads_Normalized vs. team_size_bin
ggplot(user_commits_distinct, aes(x = team_size_bin, y = log(Downloads_All_Time), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set5") +
labs(x = "Team Size Bin", y = "Downloads", fill = "Team Size") +
custom_theme# Plot Downloads_Normalized vs. team_size_bin
ggplot(user_commits_distinct, aes(x = team_size_bin, y = log(Reverse_Depends_Count), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set5") +
labs(x = "Team Size Bin", y = "Rev Dep", fill = "Team Size") +
custom_theme# Define a custom theme
custom_theme <- theme_minimal() +
theme(
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 12),
axis.text.y = element_text(size = 12),
axis.title = element_text(size = 14, face = "bold"),
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
legend.title = element_text(size = 12),
legend.text = element_text(size = 10)
)
# Plot stargazer_count_normalized vs. team_size_bin
ggplot(user_commits_no_outliers1, aes(x = team_size_bin, y = log(stargazer_count_normalized), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_westat(option = "BLUES", drop = FALSE) +
labs(x = "Team Size Bin", y = "Stargazer Count Normalized", fill = "Team Size") +
custom_theme# Plot fork_count_normalized vs. team_size_bin
ggplot(user_commits_no_outliers1, aes(x = team_size_bin, y = log(fork_count_normalized), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set3") +
labs(x = "Team Size Bin", y = "Fork Count Normalized", fill = "Team Size") +
custom_theme# Plot Downloads_Normalized vs. team_size_bin
ggplot(user_commits_no_outliers1, aes(x = team_size_bin, y = log(Downloads_Normalized), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set5") +
labs(x = "Team Size Bin", y = "Downloads Normalized", fill = "Team Size") +
custom_theme# Plot Downloads_Normalized vs. team_size_bin
ggplot(user_commits_no_outliers1, aes(x = team_size_bin, y = log(reverse_dep_normalized), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set5") +
labs(x = "Team Size Bin", y = "Rev Dep Normalized", fill = "Team Size") +
custom_theme# Define a custom theme
custom_theme <- theme_minimal() +
theme(
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 12),
axis.text.y = element_text(size = 12),
axis.title = element_text(size = 14, face = "bold"),
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
legend.title = element_text(size = 12),
legend.text = element_text(size = 10)
)
# Plot stargazer_count_normalized vs. team_size_bin
ggplot(user_commits_no_outliers1, aes(x = team_size_bin, y = log(stargazer_count), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_westat(option = "BLUES", drop = FALSE) +
labs(x = "Team Size Bin", y = "Stargazer Count", fill = "Team Size") +
custom_theme# Plot fork_count_normalized vs. team_size_bin
ggplot(user_commits_no_outliers1, aes(x = team_size_bin, y = log(fork_count), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set3") +
labs(x = "Team Size Bin", y = "Fork Count", fill = "Team Size") +
custom_theme# Plot Downloads_Normalized vs. team_size_bin
ggplot(user_commits_no_outliers1, aes(x = team_size_bin, y = log(Downloads_All_Time), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set5") +
labs(x = "Team Size Bin", y = "Downloads", fill = "Team Size") +
custom_theme# Plot Downloads_Normalized vs. team_size_bin
ggplot(user_commits_no_outliers1, aes(x = team_size_bin, y = log(Reverse_Depends_Count), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set5") +
labs(x = "Team Size Bin", y = "Rev Dep", fill = "Team Size") +
custom_theme# Define a custom theme
custom_theme <- theme_minimal() +
theme(
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 12),
axis.text.y = element_text(size = 12),
axis.title = element_text(size = 14, face = "bold"),
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
legend.title = element_text(size = 12),
legend.text = element_text(size = 10)
)
# Plot stargazer_count_normalized vs. team_size_bin
ggplot(user_commits_no_outliers_z, aes(x = team_size_bin, y = log(stargazer_count_normalized), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_westat(option = "BLUES", drop = FALSE) +
labs(x = "Team Size Bin", y = "Stargazer Count Normalized", fill = "Team Size") +
custom_theme# Plot fork_count_normalized vs. team_size_bin
ggplot(user_commits_no_outliers_z, aes(x = team_size_bin, y = log(fork_count_normalized), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set3") +
labs(x = "Team Size Bin", y = "Fork Count Normalized", fill = "Team Size") +
custom_theme# Plot Downloads_Normalized vs. team_size_bin
ggplot(user_commits_no_outliers_z, aes(x = team_size_bin, y = log(Downloads_Normalized), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set5") +
labs(x = "Team Size Bin", y = "Downloads Normalized", fill = "Team Size") +
custom_theme# Plot Downloads_Normalized vs. team_size_bin
ggplot(user_commits_no_outliers_z, aes(x = team_size_bin, y = log(reverse_dep_normalized), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set5") +
labs(x = "Team Size Bin", y = "Rev Dep Normalized", fill = "Team Size") +
custom_theme# Define a custom theme
custom_theme <- theme_minimal() +
theme(
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 12),
axis.text.y = element_text(size = 12),
axis.title = element_text(size = 14, face = "bold"),
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
legend.title = element_text(size = 12),
legend.text = element_text(size = 10)
)
# Plot stargazer_count_normalized vs. team_size_bin
ggplot(user_commits_no_outliers_z, aes(x = team_size_bin, y = log(stargazer_count), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_westat(option = "BLUES", drop = FALSE) +
labs(x = "Team Size Bin", y = "Stargazer Count", fill = "Team Size") +
custom_theme# Plot fork_count_normalized vs. team_size_bin
ggplot(user_commits_no_outliers_z, aes(x = team_size_bin, y = log(fork_count), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set3") +
labs(x = "Team Size Bin", y = "Fork Count", fill = "Team Size") +
custom_theme# Plot Downloads_Normalized vs. team_size_bin
ggplot(user_commits_no_outliers_z, aes(x = team_size_bin, y = log(Downloads_All_Time), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set5") +
labs(x = "Team Size Bin", y = "Downloads", fill = "Team Size") +
custom_theme# Plot Downloads_Normalized vs. team_size_bin
ggplot(user_commits_no_outliers_z, aes(x = team_size_bin, y = log(Reverse_Depends_Count), fill = team_size_bin)) +
geom_boxplot(outlier.color = "red", outlier.shape = 21, outlier.size = 3) +
scale_fill_brewer(palette = "Set5") +
labs(x = "Team Size Bin", y = "Rev Dep", fill = "Team Size") +
custom_theme# Calculate correlation between team_size and the normalized counts
correlation_matrix <- user_commits_distinct %>%
select(team_size, stargazer_count_normalized, fork_count_normalized, Downloads_Normalized, reverse_dep_normalized) %>%
cor(use = "complete.obs")
# View the correlation matrix
print(correlation_matrix) team_size stargazer_count_normalized
team_size 1.0000000 0.75543882
stargazer_count_normalized 0.7554388 1.00000000
fork_count_normalized 0.7782930 0.92576329
Downloads_Normalized 0.2759388 0.10871598
reverse_dep_normalized 0.1787497 0.07614058
fork_count_normalized Downloads_Normalized
team_size 0.77829300 0.2759388
stargazer_count_normalized 0.92576329 0.1087160
fork_count_normalized 1.00000000 0.1200080
Downloads_Normalized 0.12000801 1.0000000
reverse_dep_normalized 0.09906453 0.2785460
reverse_dep_normalized
team_size 0.17874974
stargazer_count_normalized 0.07614058
fork_count_normalized 0.09906453
Downloads_Normalized 0.27854596
reverse_dep_normalized 1.00000000
library(reshape2)
# Melt the correlation matrix
melted_correlation_matrix <- melt(correlation_matrix)
ggplot(data = melted_correlation_matrix, aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
geom_text(aes(label = round(value, 2)), color = "black") +
scale_fill_gradient2(low = "blue", high = "darkblue", mid = "lightblue", midpoint = 0, limit = c(-1, 1), space = "Lab", name="Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
labs(title = "Correlation Matrix Heatmap", x = "", y = "")# Calculate correlation between team_size and the normalized counts
correlation_matrix <- user_commits_distinct %>%
select(team_size, stargazer_count, fork_count, Downloads_All_Time, Reverse_Depends_Count) %>%
cor(use = "complete.obs")
# View the correlation matrix
print(correlation_matrix) team_size stargazer_count fork_count Downloads_All_Time
team_size 1.0000000 0.7912326 0.7598055 0.3070778
stargazer_count 0.7912326 1.0000000 0.9419632 0.1674230
fork_count 0.7598055 0.9419632 1.0000000 0.1852880
Downloads_All_Time 0.3070778 0.1674230 0.1852880 1.0000000
Reverse_Depends_Count 0.2372756 0.1674554 0.1951223 0.4449235
Reverse_Depends_Count
team_size 0.2372756
stargazer_count 0.1674554
fork_count 0.1951223
Downloads_All_Time 0.4449235
Reverse_Depends_Count 1.0000000
# Melt the correlation matrix
melted_correlation_matrix <- melt(correlation_matrix)
ggplot(data = melted_correlation_matrix, aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
geom_text(aes(label = round(value, 2)), color = "black") +
scale_fill_gradient2(low = "blue", high = "darkblue", mid = "lightblue", midpoint = 0, limit = c(-1, 1), space = "Lab", name="Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
labs(title = "Correlation Matrix Heatmap", x = "", y = "")# Calculate correlation between team_size and the normalized counts
correlation_matrix <- user_commits_no_outliers_z %>%
select(team_size, stargazer_count_normalized, fork_count_normalized, Downloads_Normalized, reverse_dep_normalized) %>%
cor(use = "complete.obs")
# View the correlation matrix
print(correlation_matrix) team_size stargazer_count_normalized
team_size 1.0000000 0.46840694
stargazer_count_normalized 0.4684069 1.00000000
fork_count_normalized 0.5738767 0.84916902
Downloads_Normalized 0.3410986 0.13893791
reverse_dep_normalized 0.1162250 0.04734392
fork_count_normalized Downloads_Normalized
team_size 0.5738767 0.3410986
stargazer_count_normalized 0.8491690 0.1389379
fork_count_normalized 1.0000000 0.1713354
Downloads_Normalized 0.1713354 1.0000000
reverse_dep_normalized 0.0756769 0.1463907
reverse_dep_normalized
team_size 0.11622504
stargazer_count_normalized 0.04734392
fork_count_normalized 0.07567690
Downloads_Normalized 0.14639068
reverse_dep_normalized 1.00000000
# Melt the correlation matrix
melted_correlation_matrix <- melt(correlation_matrix)
ggplot(data = melted_correlation_matrix, aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
geom_text(aes(label = round(value, 2)), color = "black") +
scale_fill_gradient2(low = "blue", high = "darkblue", mid = "lightblue", midpoint = 0, limit = c(-1, 1), space = "Lab", name="Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
labs(title = "Correlation Matrix Heatmap", x = "", y = "")# Calculate correlation between team_size and the normalized counts
correlation_matrix <- user_commits_no_outliers_z %>%
select(team_size, stargazer_count, fork_count, Downloads_All_Time, Reverse_Depends_Count) %>%
cor(use = "complete.obs")
# View the correlation matrix
print(correlation_matrix) team_size stargazer_count fork_count Downloads_All_Time
team_size 1.0000000 0.5699283 0.6312434 0.3692510
stargazer_count 0.5699283 1.0000000 0.8223352 0.2075291
fork_count 0.6312434 0.8223352 1.0000000 0.2449997
Downloads_All_Time 0.3692510 0.2075291 0.2449997 1.0000000
Reverse_Depends_Count 0.1569053 0.1003055 0.1365102 0.2489343
Reverse_Depends_Count
team_size 0.1569053
stargazer_count 0.1003055
fork_count 0.1365102
Downloads_All_Time 0.2489343
Reverse_Depends_Count 1.0000000
# Melt the correlation matrix
melted_correlation_matrix <- melt(correlation_matrix)
ggplot(data = melted_correlation_matrix, aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
geom_text(aes(label = round(value, 2)), color = "black") +
scale_fill_gradient2(low = "blue", high = "darkblue", mid = "lightblue", midpoint = 0, limit = c(-1, 1), space = "Lab", name="Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
labs(title = "Correlation Matrix Heatmap", x = "", y = "")library(broom)
# Filter out non-positive values and missing values
user_commits_filtered1 <- user_commits_distinct %>%
filter(stargazer_count_normalized > 0, !is.na(stargazer_count_normalized),
fork_count_normalized > 0, !is.na(fork_count_normalized),
Downloads_Normalized > 0, !is.na(Downloads_Normalized),
reverse_dep_normalized > 0, !is.na(reverse_dep_normalized))
# Perform linear regression and get summary and confidence intervals
model_stargazer <- lm(stargazer_count_normalized ~ team_size, data = user_commits_filtered1)
model_fork <- lm(fork_count_normalized ~ team_size, data = user_commits_filtered1)
model_downloads <- lm(log(Downloads_Normalized) ~ team_size, data = user_commits_filtered1)
model_revdep <- lm(log(reverse_dep_normalized) ~ team_size, data = user_commits_filtered1)
summary(model_stargazer)
Call:
lm(formula = stargazer_count_normalized ~ team_size, data = user_commits_filtered1)
Residuals:
Min 1Q Median 3Q Max
-693.28 -3.34 3.33 7.63 1143.15
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.6722 3.5147 -2.467 0.0139 *
team_size 2.3348 0.0823 28.368 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 73.48 on 503 degrees of freedom
Multiple R-squared: 0.6154, Adjusted R-squared: 0.6146
F-statistic: 804.8 on 1 and 503 DF, p-value: < 2.2e-16
confint(model_stargazer) 2.5 % 97.5 %
(Intercept) -15.577516 -1.766800
team_size 2.173114 2.496518
summary(model_fork)
Call:
lm(formula = fork_count_normalized ~ team_size, data = user_commits_filtered1)
Residuals:
Min 1Q Median 3Q Max
-248.51 -0.83 2.98 4.34 411.02
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.16034 1.22170 -4.224 2.85e-05 ***
team_size 0.82062 0.02861 28.685 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 25.54 on 503 degrees of freedom
Multiple R-squared: 0.6206, Adjusted R-squared: 0.6199
F-statistic: 822.8 on 1 and 503 DF, p-value: < 2.2e-16
confint(model_fork) 2.5 % 97.5 %
(Intercept) -7.5605956 -2.7600782
team_size 0.7644126 0.8768258
summary(model_downloads)
Call:
lm(formula = log(Downloads_Normalized) ~ team_size, data = user_commits_filtered1)
Residuals:
Min 1Q Median 3Q Max
-9.4543 -1.7305 -0.1898 1.3366 5.3158
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.767840 0.097607 110.319 <2e-16 ***
team_size 0.021243 0.002286 9.294 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.041 on 503 degrees of freedom
Multiple R-squared: 0.1466, Adjusted R-squared: 0.1449
F-statistic: 86.38 on 1 and 503 DF, p-value: < 2.2e-16
confint(model_downloads) 2.5 % 97.5 %
(Intercept) 10.57607342 10.95960736
team_size 0.01675265 0.02573381
summary(model_revdep)
Call:
lm(formula = log(reverse_dep_normalized) ~ team_size, data = user_commits_filtered1)
Residuals:
Min 1Q Median 3Q Max
-3.8156 -0.7941 -0.2377 0.5859 4.6569
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.304352 0.052235 -24.971 < 2e-16 ***
team_size 0.006325 0.001223 5.171 3.36e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.092 on 503 degrees of freedom
Multiple R-squared: 0.05048, Adjusted R-squared: 0.04859
F-statistic: 26.74 on 1 and 503 DF, p-value: 3.364e-07
confint(model_revdep) 2.5 % 97.5 %
(Intercept) -1.406977950 -1.201726971
team_size 0.003922063 0.008728401
# Tidy the models
tidy_stargazer <- tidy(model_stargazer, conf.int = TRUE)
tidy_fork <- tidy(model_fork, conf.int = TRUE)
tidy_downloads <- tidy(model_downloads, conf.int = TRUE)
tidy_revdep <- tidy(model_revdep, conf.int = TRUE)
# Combine the tidied data
tidy_combined <- bind_rows(
tidy_stargazer %>% mutate(model = "Stargazer Count Normalized"),
tidy_fork %>% mutate(model = "Fork Count Normalized"),
tidy_downloads %>% mutate(model = "Log Downloads Normalized"),
tidy_revdep %>% mutate(model = "Log Rev Dep Normalized")
)
# Filter out the intercept terms
tidy_combined <- tidy_combined %>% filter(term == "team_size")
# Determine y-axis limits to ensure visibility of confidence intervals
y_limits <- range(tidy_combined$conf.low, tidy_combined$conf.high)
# Create the plot
ggplot(tidy_combined, aes(x = model, y = estimate, ymin = conf.low, ymax = conf.high, color = model)) +
geom_pointrange(size = 1.2) +
geom_point(size = 3) +
geom_vline(xintercept = 0, linetype = "dashed", color = "gray") +
coord_flip() +
scale_y_continuous(limits = y_limits) + # Adjust the limits based on confidence intervals
labs(title = "Confidence Intervals for Team Size Coefficients",
x = "Model",
y = "Coefficient Estimate") +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5), # Reduce title size
axis.title.x = element_text(size = 14, face = "bold"),
axis.title.y = element_text(size = 14, face = "bold"),
axis.text = element_text(size = 12),
legend.position = "none"
) +
scale_color_brewer(palette = "Set2")# Filter out non-positive values and missing values
user_commits_filtered2 <- user_commits_distinct %>%
filter(stargazer_count > 0, !is.na(stargazer_count),
fork_count > 0, !is.na(fork_count),
Downloads_All_Time > 0, !is.na(Downloads_All_Time),
Reverse_Depends_Count > 0, !is.na(Reverse_Depends_Count))
# Perform linear regression and get summary and confidence intervals
model_stargazer <- lm(stargazer_count ~ team_size, data = user_commits_filtered2)
model_fork <- lm(fork_count ~ team_size, data = user_commits_filtered2)
model_downloads <- lm(log(Downloads_All_Time) ~ team_size, data = user_commits_filtered2)
model_revdep <- lm(log(Reverse_Depends_Count) ~ team_size, data = user_commits_filtered2)
summary(model_stargazer)
Call:
lm(formula = stargazer_count ~ team_size, data = user_commits_filtered2)
Residuals:
Min 1Q Median 3Q Max
-7359.0 -26.5 56.7 98.6 10930.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -119.8956 33.6007 -3.568 0.000394 ***
team_size 24.3104 0.7868 30.897 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 702.5 on 503 degrees of freedom
Multiple R-squared: 0.6549, Adjusted R-squared: 0.6542
F-statistic: 954.6 on 1 and 503 DF, p-value: < 2.2e-16
confint(model_stargazer) 2.5 % 97.5 %
(Intercept) -185.91056 -53.88072
team_size 22.76449 25.85622
summary(model_fork)
Call:
lm(formula = fork_count ~ team_size, data = user_commits_filtered2)
Residuals:
Min 1Q Median 3Q Max
-2631.8 -9.0 33.7 46.1 3907.0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -58.4527 12.1551 -4.809 2.01e-06 ***
team_size 8.5845 0.2846 30.160 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 254.1 on 503 degrees of freedom
Multiple R-squared: 0.6439, Adjusted R-squared: 0.6432
F-statistic: 909.6 on 1 and 503 DF, p-value: < 2.2e-16
confint(model_fork) 2.5 % 97.5 %
(Intercept) -82.333718 -34.571606
team_size 8.025327 9.143766
summary(model_downloads)
Call:
lm(formula = log(Downloads_All_Time) ~ team_size, data = user_commits_filtered2)
Residuals:
Min 1Q Median 3Q Max
-9.3728 -1.6455 -0.0803 1.5539 4.9093
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.920120 0.105600 122.350 <2e-16 ***
team_size 0.021367 0.002473 8.641 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.208 on 503 degrees of freedom
Multiple R-squared: 0.1293, Adjusted R-squared: 0.1275
F-statistic: 74.66 on 1 and 503 DF, p-value: < 2.2e-16
confint(model_downloads) 2.5 % 97.5 %
(Intercept) 12.71264851 13.12759066
team_size 0.01650887 0.02622552
summary(model_revdep)
Call:
lm(formula = log(Reverse_Depends_Count) ~ team_size, data = user_commits_filtered2)
Residuals:
Min 1Q Median 3Q Max
-4.5184 -0.7519 -0.2850 0.5455 4.6885
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.727682 0.051652 14.09 < 2e-16 ***
team_size 0.008079 0.001210 6.68 6.37e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.08 on 503 degrees of freedom
Multiple R-squared: 0.08147, Adjusted R-squared: 0.07965
F-statistic: 44.62 on 1 and 503 DF, p-value: 6.366e-11
confint(model_revdep) 2.5 % 97.5 %
(Intercept) 0.626202254 0.82916095
team_size 0.005702663 0.01045532
# Tidy the models
tidy_stargazer <- tidy(model_stargazer, conf.int = TRUE)
tidy_fork <- tidy(model_fork, conf.int = TRUE)
tidy_downloads <- tidy(model_downloads, conf.int = TRUE)
tidy_revdep <- tidy(model_revdep, conf.int = TRUE)
# Combine the tidied data
tidy_combined <- bind_rows(
tidy_stargazer %>% mutate(model = "Stargazer Count"),
tidy_fork %>% mutate(model = "Fork Count"),
tidy_downloads %>% mutate(model = "Downloads"),
tidy_revdep %>% mutate(model = "Rev Dep")
)
# Filter out the intercept terms
tidy_combined <- tidy_combined %>% filter(term == "team_size")
# Determine y-axis limits to ensure visibility of confidence intervals
y_limits <- range(tidy_combined$conf.low, tidy_combined$conf.high)
# Create the plot
ggplot(tidy_combined, aes(x = model, y = estimate, ymin = conf.low, ymax = conf.high, color = model)) +
geom_pointrange(size = 1.2) +
geom_point(size = 3) +
geom_vline(xintercept = 0, linetype = "dashed", color = "gray") +
coord_flip() +
scale_y_continuous(limits = y_limits) + # Adjust the limits based on confidence intervals
labs(title = "Confidence Intervals for Team Size Coefficients",
x = "Model",
y = "Coefficient Estimate") +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5), # Reduce title size
axis.title.x = element_text(size = 14, face = "bold"),
axis.title.y = element_text(size = 14, face = "bold"),
axis.text = element_text(size = 12),
legend.position = "none"
) +
scale_color_brewer(palette = "Set2")# Filter out non-positive values and missing values
user_commits_filtered3 <- user_commits_no_outliers_z %>%
filter(stargazer_count_normalized > 0, !is.na(stargazer_count_normalized),
fork_count_normalized > 0, !is.na(fork_count_normalized),
Downloads_Normalized > 0, !is.na(Downloads_Normalized),
reverse_dep_normalized > 0, !is.na(reverse_dep_normalized))
# Perform linear regression and get summary and confidence intervals
model_stargazer <- lm(stargazer_count_normalized ~ team_size, data = user_commits_filtered3)
model_fork <- lm(fork_count_normalized ~ team_size, data = user_commits_filtered3)
model_downloads <- lm(log(Downloads_Normalized) ~ team_size, data = user_commits_filtered3)
model_revdep <- lm(log(reverse_dep_normalized) ~ team_size, data = user_commits_filtered3)
summary(model_stargazer)
Call:
lm(formula = stargazer_count_normalized ~ team_size, data = user_commits_filtered3)
Residuals:
Min 1Q Median 3Q Max
-67.293 -7.024 -2.675 0.630 290.788
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.9434 1.6179 -0.583 0.56
team_size 2.0315 0.1269 16.005 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 26.34 on 475 degrees of freedom
Multiple R-squared: 0.3503, Adjusted R-squared: 0.349
F-statistic: 256.1 on 1 and 475 DF, p-value: < 2.2e-16
confint(model_stargazer) 2.5 % 97.5 %
(Intercept) -4.122506 2.235645
team_size 1.782113 2.280959
summary(model_fork)
Call:
lm(formula = fork_count_normalized ~ team_size, data = user_commits_filtered3)
Residuals:
Min 1Q Median 3Q Max
-12.757 -1.314 -0.325 0.376 35.070
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.30624 0.26143 -1.171 0.242
team_size 0.44082 0.02051 21.492 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.257 on 475 degrees of freedom
Multiple R-squared: 0.493, Adjusted R-squared: 0.4919
F-statistic: 461.9 on 1 and 475 DF, p-value: < 2.2e-16
confint(model_fork) 2.5 % 97.5 %
(Intercept) -0.8199446 0.2074625
team_size 0.4005189 0.4811271
summary(model_downloads)
Call:
lm(formula = log(Downloads_Normalized) ~ team_size, data = user_commits_filtered3)
Residuals:
Min 1Q Median 3Q Max
-3.9299 -1.4529 -0.1037 1.2123 5.0561
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.93952 0.10758 92.39 <2e-16 ***
team_size 0.11191 0.00844 13.26 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.752 on 475 degrees of freedom
Multiple R-squared: 0.2701, Adjusted R-squared: 0.2686
F-statistic: 175.8 on 1 and 475 DF, p-value: < 2.2e-16
confint(model_downloads) 2.5 % 97.5 %
(Intercept) 9.7281316 10.1509103
team_size 0.0953228 0.1284931
summary(model_revdep)
Call:
lm(formula = log(reverse_dep_normalized) ~ team_size, data = user_commits_filtered3)
Residuals:
Min 1Q Median 3Q Max
-1.8315 -0.7179 -0.2284 0.5128 4.6767
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.505822 0.062169 -24.221 < 2e-16 ***
team_size 0.026514 0.004878 5.436 8.73e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.012 on 475 degrees of freedom
Multiple R-squared: 0.05856, Adjusted R-squared: 0.05658
F-statistic: 29.55 on 1 and 475 DF, p-value: 8.734e-08
confint(model_revdep) 2.5 % 97.5 %
(Intercept) -1.62798240 -1.38366168
team_size 0.01692987 0.03609875
# Tidy the models
tidy_stargazer <- tidy(model_stargazer, conf.int = TRUE)
tidy_fork <- tidy(model_fork, conf.int = TRUE)
tidy_downloads <- tidy(model_downloads, conf.int = TRUE)
tidy_revdep <- tidy(model_revdep, conf.int = TRUE)
# Combine the tidied data
tidy_combined <- bind_rows(
tidy_stargazer %>% mutate(model = "Stargazer Count Normalized"),
tidy_fork %>% mutate(model = "Fork Count Normalized"),
tidy_downloads %>% mutate(model = "Log Downloads Normalized"),
tidy_revdep %>% mutate(model = "Log Rev Dep Normalized")
)
# Filter out the intercept terms
tidy_combined <- tidy_combined %>% filter(term == "team_size")
# Determine y-axis limits to ensure visibility of confidence intervals
y_limits <- range(tidy_combined$conf.low, tidy_combined$conf.high)
# Create the plot
ggplot(tidy_combined, aes(x = model, y = estimate, ymin = conf.low, ymax = conf.high, color = model)) +
geom_pointrange(size = 1.2) +
geom_point(size = 3) +
geom_vline(xintercept = 0, linetype = "dashed", color = "gray") +
coord_flip() +
scale_y_continuous(limits = y_limits) + # Adjust the limits based on confidence intervals
labs(title = "Confidence Intervals for Team Size Coefficients",
x = "Model",
y = "Coefficient Estimate") +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5), # Reduce title size
axis.title.x = element_text(size = 14, face = "bold"),
axis.title.y = element_text(size = 14, face = "bold"),
axis.text = element_text(size = 12),
legend.position = "none"
) +
scale_color_brewer(palette = "Set2")# Filter out non-positive values and missing values
user_commits_filtered4 <- user_commits_no_outliers_z %>%
filter(stargazer_count > 0, !is.na(stargazer_count),
fork_count > 0, !is.na(fork_count),
Downloads_All_Time > 0, !is.na(Downloads_All_Time),
Reverse_Depends_Count > 0, !is.na(Reverse_Depends_Count))
# Perform linear regression and get summary and confidence intervals
model_stargazer <- lm(stargazer_count ~ team_size, data = user_commits_filtered4)
model_fork <- lm(fork_count ~ team_size, data = user_commits_filtered4)
model_downloads <- lm(log(Downloads_All_Time) ~ team_size, data = user_commits_filtered4)
model_revdep <- lm(log(Reverse_Depends_Count) ~ team_size, data = user_commits_filtered4)
summary(model_stargazer)
Call:
lm(formula = stargazer_count ~ team_size, data = user_commits_filtered4)
Residuals:
Min 1Q Median 3Q Max
-705.46 -57.03 -15.65 9.58 1986.20
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -20.870 13.223 -1.578 0.115
team_size 18.762 1.037 18.085 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 215.3 on 475 degrees of freedom
Multiple R-squared: 0.4078, Adjusted R-squared: 0.4065
F-statistic: 327.1 on 1 and 475 DF, p-value: < 2.2e-16
confint(model_stargazer) 2.5 % 97.5 %
(Intercept) -46.85223 5.113025
team_size 16.72326 20.800338
summary(model_fork)
Call:
lm(formula = fork_count ~ team_size, data = user_commits_filtered4)
Residuals:
Min 1Q Median 3Q Max
-132.02 -11.93 -2.18 4.07 419.98
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.1981 2.4687 -2.106 0.0358 *
team_size 4.1882 0.1937 21.624 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 40.2 on 475 degrees of freedom
Multiple R-squared: 0.4961, Adjusted R-squared: 0.495
F-statistic: 467.6 on 1 and 475 DF, p-value: < 2.2e-16
confint(model_fork) 2.5 % 97.5 %
(Intercept) -10.048919 -0.347224
team_size 3.807587 4.568761
summary(model_downloads)
Call:
lm(formula = log(Downloads_All_Time) ~ team_size, data = user_commits_filtered4)
Residuals:
Min 1Q Median 3Q Max
-4.8558 -1.4676 -0.0868 1.3518 4.7035
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.082794 0.119156 101.40 <2e-16 ***
team_size 0.112912 0.009349 12.08 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.94 on 475 degrees of freedom
Multiple R-squared: 0.2349, Adjusted R-squared: 0.2333
F-statistic: 145.9 on 1 and 475 DF, p-value: < 2.2e-16
confint(model_downloads) 2.5 % 97.5 %
(Intercept) 11.84865489 12.3169324
team_size 0.09454155 0.1312816
summary(model_revdep)
Call:
lm(formula = log(Reverse_Depends_Count) ~ team_size, data = user_commits_filtered4)
Residuals:
Min 1Q Median 3Q Max
-2.0066 -0.6009 -0.4928 0.4789 4.7078
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.456771 0.058992 7.743 5.91e-14 ***
team_size 0.036043 0.004628 7.787 4.33e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9605 on 475 degrees of freedom
Multiple R-squared: 0.1132, Adjusted R-squared: 0.1114
F-statistic: 60.64 on 1 and 475 DF, p-value: 4.326e-14
confint(model_revdep) 2.5 % 97.5 %
(Intercept) 0.34085461 0.5726880
team_size 0.02694846 0.0451376
# Tidy the models
tidy_stargazer <- tidy(model_stargazer, conf.int = TRUE)
tidy_fork <- tidy(model_fork, conf.int = TRUE)
tidy_downloads <- tidy(model_downloads, conf.int = TRUE)
tidy_revdep <- tidy(model_revdep, conf.int = TRUE)
# Combine the tidied data
tidy_combined <- bind_rows(
tidy_stargazer %>% mutate(model = "Stargazer Count"),
tidy_fork %>% mutate(model = "Fork Count"),
tidy_downloads %>% mutate(model = "Downloads"),
tidy_revdep %>% mutate(model = "Rev Dep")
)
# Filter out the intercept terms
tidy_combined <- tidy_combined %>% filter(term == "team_size")
# Determine y-axis limits to ensure visibility of confidence intervals
y_limits <- range(tidy_combined$conf.low, tidy_combined$conf.high)
# Create the plot
ggplot(tidy_combined, aes(x = model, y = estimate, ymin = conf.low, ymax = conf.high, color = model)) +
geom_pointrange(size = 1.2) +
geom_point(size = 3) +
geom_vline(xintercept = 0, linetype = "dashed", color = "gray") +
coord_flip() +
scale_y_continuous(limits = y_limits) + # Adjust the limits based on confidence intervals
labs(title = "Confidence Intervals for Team Size Coefficients",
x = "Model",
y = "Coefficient Estimate") +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5), # Reduce title size
axis.title.x = element_text(size = 14, face = "bold"),
axis.title.y = element_text(size = 14, face = "bold"),
axis.text = element_text(size = 12),
legend.position = "none"
) +
scale_color_brewer(palette = "Set2")user_commits_filtered1$team_size_bin <- as.factor(user_commits_filtered1$team_size_bin)
# Perform regression analysis using team_size_bin
model_stargazer_bin <- lm(stargazer_count_normalized ~ team_size_bin, data = user_commits_filtered1)
model_fork_bin <- lm(fork_count_normalized ~ team_size_bin, data = user_commits_filtered1)
model_downloads_bin <- lm(Downloads_Normalized ~ team_size_bin, data = user_commits_filtered1)
model_revdep_bin <- lm(reverse_dep_normalized ~ team_size_bin, data = user_commits_filtered1)
# Tidy the models
tidy_stargazer_bin <- tidy(model_stargazer_bin, conf.int = TRUE)
tidy_fork_bin <- tidy(model_fork_bin, conf.int = TRUE)
tidy_downloads_bin <- tidy(model_downloads_bin, conf.int = TRUE)
tidy_revdep_bin <- tidy(model_revdep_bin, conf.int = TRUE)
# Combine the tidied data
tidy_combined_bin <- bind_rows(
tidy_stargazer_bin %>% mutate(model = "Stargazer Count Normalized"),
tidy_fork_bin %>% mutate(model = "Fork Count Normalized"),
tidy_downloads_bin %>% mutate(model = "Downloads Normalized"),
tidy_revdep_bin %>% mutate(model = "Rev Dep Normalized")
)
# Filter out the intercept terms
tidy_combined_bin <- tidy_combined_bin %>% filter(term != "(Intercept)")
# Create the plot
ggplot(tidy_combined_bin, aes(x = term, y = estimate, ymin = conf.low, ymax = conf.high, color = model)) +
geom_pointrange(size = 1.2) +
geom_point(size = 3) +
coord_flip() +
facet_wrap(~ model, scales = "free_y") +
labs(title = "Regression Coefficients for Team Size Bins",
x = "Team Size Bin",
y = "Coefficient Estimate") +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold"),
axis.text = element_text(size = 10),
strip.text = element_text(size = 12, face = "bold"),
legend.position = "bottom",
legend.title = element_blank()
) +
scale_color_brewer(palette = "Set1")ggplot(tidy_stargazer_bin %>% filter(term != "(Intercept)"),
aes(x = term, y = estimate, ymin = conf.low, ymax = conf.high)) +
geom_pointrange(size = 1.2, color = "darkblue") +
geom_point(size = 4, shape = 21, fill = "blue") +
coord_flip() +
labs(title = "Regression Coefficients for Team Size Bins (Stargazer Count Normalized)",
x = "Team Size Bin",
y = "Coefficient Estimate") +
theme_bw() +
theme(
plot.title = element_text(size = 10, face = "bold", hjust = 0.5),
axis.title.x = element_text(size = 14, face = "bold"),
axis.title.y = element_text(size = 14, face = "bold"),
axis.text = element_text(size = 12),
panel.grid.major = element_line(color = "gray", size = 0.5),
panel.grid.minor = element_line(color = "lightgray", size = 0.25),
panel.background = element_rect(fill = "white", color = "black")
)ggplot(tidy_fork_bin %>% filter(term != "(Intercept)"),
aes(x = term, y = estimate, ymin = conf.low, ymax = conf.high)) +
geom_pointrange(size = 1.2, color = "darkgreen") +
geom_point(size = 4, shape = 21, fill = "green") +
coord_flip() +
labs(title = "Regression Coefficients for Team Size Bins (Fork Count Normalized)",
x = "Team Size Bin",
y = "Coefficient Estimate") +
theme_bw() +
theme(
plot.title = element_text(size = 10, face = "bold", hjust = 0.5),
axis.title.x = element_text(size = 14, face = "bold"),
axis.title.y = element_text(size = 14, face = "bold"),
axis.text = element_text(size = 12),
panel.grid.major = element_line(color = "gray", size = 0.5),
panel.grid.minor = element_line(color = "lightgray", size = 0.25),
panel.background = element_rect(fill = "white", color = "black")
)ggplot(tidy_downloads_bin %>% filter(term != "(Intercept)"),
aes(x = term, y = estimate, ymin = conf.low, ymax = conf.high)) +
geom_pointrange(size = 1.2, color = "darkred") +
geom_point(size = 4, shape = 21, fill = "red") +
coord_flip() +
labs(title = "Regression Coefficients for Team Size Bins (Log Downloads Normalized)",
x = "Team Size Bin",
y = "Coefficient Estimate") +
theme_bw() +
theme(
plot.title = element_text(size = 10, face = "bold", hjust = 0.5),
axis.title.x = element_text(size = 14, face = "bold"),
axis.title.y = element_text(size = 14, face = "bold"),
axis.text = element_text(size = 12),
panel.grid.major = element_line(color = "gray", size = 0.5),
panel.grid.minor = element_line(color = "lightgray", size = 0.25),
panel.background = element_rect(fill = "white", color = "black")
)ggplot(tidy_revdep_bin %>% filter(term != "(Intercept)"),
aes(x = term, y = estimate, ymin = conf.low, ymax = conf.high)) +
geom_pointrange(size = 1.2, color = "darkred") +
geom_point(size = 4, shape = 21, fill = "red") +
coord_flip() +
labs(title = "Regression Coefficients for Team Size Bins (Log revdep Normalized)",
x = "Team Size Bin",
y = "Coefficient Estimate") +
theme_bw() +
theme(
plot.title = element_text(size = 10, face = "bold", hjust = 0.5),
axis.title.x = element_text(size = 14, face = "bold"),
axis.title.y = element_text(size = 14, face = "bold"),
axis.text = element_text(size = 12),
panel.grid.major = element_line(color = "gray", size = 0.5),
panel.grid.minor = element_line(color = "lightgray", size = 0.25),
panel.background = element_rect(fill = "white", color = "black")
)library(multcomp)
# Perform Tukey's HSD test
tukey_test <- glht(model_stargazer_bin, linfct = mcp(team_size_bin = "Tukey"))
# Summary of the Tukey test results
summary(tukey_test)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lm(formula = stargazer_count_normalized ~ team_size_bin, data = user_commits_filtered1)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
[2] - [1] == 0 2.077 21.749 0.095 0.99998
[3-4] - [1] == 0 3.696 20.232 0.183 0.99974
[5-7] - [1] == 0 7.903 21.749 0.363 0.99613
[8-880] - [1] == 0 62.245 19.093 3.260 0.01008 *
[3-4] - [2] == 0 1.620 17.061 0.095 0.99998
[5-7] - [2] == 0 5.826 18.835 0.309 0.99793
[8-880] - [2] == 0 60.168 15.694 3.834 0.00133 **
[5-7] - [3-4] == 0 4.206 17.061 0.247 0.99915
[8-880] - [3-4] == 0 58.548 13.514 4.332 < 0.001 ***
[8-880] - [5-7] == 0 54.342 15.694 3.463 0.00499 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
# Perform Tukey's HSD test
tukey_test <- glht(model_fork_bin, linfct = mcp(team_size_bin = "Tukey"))
# Summary of the Tukey test results
summary(tukey_test)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lm(formula = fork_count_normalized ~ team_size_bin, data = user_commits_filtered1)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
[2] - [1] == 0 0.2323 7.6877 0.030 1.00000
[3-4] - [1] == 0 0.7922 7.1516 0.111 0.99996
[5-7] - [1] == 0 1.8214 7.6877 0.237 0.99927
[8-880] - [1] == 0 17.6007 6.7491 2.608 0.06775 .
[3-4] - [2] == 0 0.5599 6.0308 0.093 0.99998
[5-7] - [2] == 0 1.5891 6.6578 0.239 0.99925
[8-880] - [2] == 0 17.3684 5.5476 3.131 0.01517 *
[5-7] - [3-4] == 0 1.0292 6.0308 0.171 0.99980
[8-880] - [3-4] == 0 16.8085 4.7770 3.519 0.00411 **
[8-880] - [5-7] == 0 15.7793 5.5476 2.844 0.03582 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
# Perform Tukey's HSD test
tukey_test <- glht(model_revdep_bin, linfct = mcp(team_size_bin = "Tukey"))
# Summary of the Tukey test results
summary(tukey_test)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lm(formula = reverse_dep_normalized ~ team_size_bin, data = user_commits_filtered1)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
[2] - [1] == 0 -0.18017 0.53574 -0.336 0.9971
[3-4] - [1] == 0 -0.05556 0.49838 -0.111 1.0000
[5-7] - [1] == 0 0.77634 0.53574 1.449 0.5881
[8-880] - [1] == 0 1.00912 0.47033 2.146 0.1968
[3-4] - [2] == 0 0.12462 0.42027 0.297 0.9982
[5-7] - [2] == 0 0.95651 0.46396 2.062 0.2320
[8-880] - [2] == 0 1.18929 0.38660 3.076 0.0181 *
[5-7] - [3-4] == 0 0.83189 0.42027 1.979 0.2705
[8-880] - [3-4] == 0 1.06468 0.33290 3.198 0.0122 *
[8-880] - [5-7] == 0 0.23278 0.38660 0.602 0.9738
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
# Perform Tukey's HSD test
tukey_test <- glht(model_downloads_bin, linfct = mcp(team_size_bin = "Tukey"))
# Summary of the Tukey test results
summary(tukey_test)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lm(formula = Downloads_Normalized ~ team_size_bin, data = user_commits_filtered1)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
[2] - [1] == 0 16770 310663 0.054 1.000
[3-4] - [1] == 0 -19646 288997 -0.068 1.000
[5-7] - [1] == 0 110083 310663 0.354 0.996
[8-880] - [1] == 0 1501247 272734 5.504 <1e-05 ***
[3-4] - [2] == 0 -36416 243704 -0.149 1.000
[5-7] - [2] == 0 93313 269042 0.347 0.997
[8-880] - [2] == 0 1484477 224178 6.622 <1e-05 ***
[5-7] - [3-4] == 0 129729 243704 0.532 0.983
[8-880] - [3-4] == 0 1520893 193039 7.879 <1e-05 ***
[8-880] - [5-7] == 0 1391164 224178 6.206 <1e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
user_commits_filtered2$team_size_bin <- as.factor(user_commits_filtered2$team_size_bin)
# Perform regression analysis using team_size_bin
model_stargazer_bin <- lm(stargazer_count ~ team_size_bin, data = user_commits_filtered2)
model_fork_bin <- lm(fork_count ~ team_size_bin, data = user_commits_filtered2)
model_downloads_bin <- lm(Downloads_All_Time ~ team_size_bin, data = user_commits_filtered2)
model_revdep_bin <- lm(Reverse_Depends_Count ~ team_size_bin, data = user_commits_filtered2)
# Tidy the models
tidy_stargazer_bin <- tidy(model_stargazer_bin, conf.int = TRUE)
tidy_fork_bin <- tidy(model_fork_bin, conf.int = TRUE)
tidy_downloads_bin <- tidy(model_downloads_bin, conf.int = TRUE)
tidy_revdep_bin <- tidy(model_revdep_bin, conf.int = TRUE)
# Combine the tidied data
tidy_combined_bin <- bind_rows(
tidy_stargazer_bin %>% mutate(model = "Stargazer Count"),
tidy_fork_bin %>% mutate(model = "Fork Count"),
tidy_downloads_bin %>% mutate(model = "Downloads"),
tidy_revdep_bin %>% mutate(model = "Rev Dep")
)
# Filter out the intercept terms
tidy_combined_bin <- tidy_combined_bin %>% filter(term != "(Intercept)")
# Create the plot
ggplot(tidy_combined_bin, aes(x = term, y = estimate, ymin = conf.low, ymax = conf.high, color = model)) +
geom_pointrange(size = 1.2) +
geom_point(size = 3) +
coord_flip() +
facet_wrap(~ model, scales = "free_y") +
labs(title = "Regression Coefficients for Team Size Bins",
x = "Team Size Bin",
y = "Coefficient Estimate") +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold"),
axis.text = element_text(size = 10),
strip.text = element_text(size = 12, face = "bold"),
legend.position = "bottom",
legend.title = element_blank()
) +
scale_color_brewer(palette = "Set1")# Perform Tukey's HSD test
tukey_test <- glht(model_stargazer_bin, linfct = mcp(team_size_bin = "Tukey"))
# Summary of the Tukey test results
summary(tukey_test)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lm(formula = stargazer_count ~ team_size_bin, data = user_commits_filtered2)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
[2] - [1] == 0 17.24 219.92 0.078 0.99999
[3-4] - [1] == 0 29.94 204.58 0.146 0.99989
[5-7] - [1] == 0 64.85 219.92 0.295 0.99828
[8-880] - [1] == 0 603.08 193.07 3.124 0.01544 *
[3-4] - [2] == 0 12.71 172.52 0.074 0.99999
[5-7] - [2] == 0 47.61 190.46 0.250 0.99910
[8-880] - [2] == 0 585.85 158.70 3.692 0.00221 **
[5-7] - [3-4] == 0 34.91 172.52 0.202 0.99961
[8-880] - [3-4] == 0 573.14 136.65 4.194 < 0.001 ***
[8-880] - [5-7] == 0 538.23 158.70 3.392 0.00642 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
# Perform Tukey's HSD test
tukey_test <- glht(model_fork_bin, linfct = mcp(team_size_bin = "Tukey"))
# Summary of the Tukey test results
summary(tukey_test)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lm(formula = fork_count ~ team_size_bin, data = user_commits_filtered2)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
[2] - [1] == 0 2.280 78.960 0.029 1.00000
[3-4] - [1] == 0 7.289 73.453 0.099 0.99998
[5-7] - [1] == 0 16.253 78.960 0.206 0.99958
[8-880] - [1] == 0 179.502 69.320 2.589 0.07113 .
[3-4] - [2] == 0 5.009 61.941 0.081 0.99999
[5-7] - [2] == 0 13.973 68.381 0.204 0.99960
[8-880] - [2] == 0 177.222 56.979 3.110 0.01614 *
[5-7] - [3-4] == 0 8.964 61.941 0.145 0.99990
[8-880] - [3-4] == 0 172.213 49.064 3.510 0.00428 **
[8-880] - [5-7] == 0 163.249 56.979 2.865 0.03379 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
# Perform Tukey's HSD test
tukey_test <- glht(model_revdep_bin, linfct = mcp(team_size_bin = "Tukey"))
# Summary of the Tukey test results
summary(tukey_test)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lm(formula = Reverse_Depends_Count ~ team_size_bin, data = user_commits_filtered2)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
[2] - [1] == 0 -0.4311 5.5480 -0.078 0.99999
[3-4] - [1] == 0 0.4615 5.1611 0.089 0.99998
[5-7] - [1] == 0 4.2622 5.5480 0.768 0.93756
[8-880] - [1] == 0 12.5728 4.8707 2.581 0.07263 .
[3-4] - [2] == 0 0.8926 4.3522 0.205 0.99959
[5-7] - [2] == 0 4.6933 4.8047 0.977 0.86161
[8-880] - [2] == 0 13.0039 4.0035 3.248 0.01043 *
[5-7] - [3-4] == 0 3.8007 4.3522 0.873 0.90365
[8-880] - [3-4] == 0 12.1113 3.4474 3.513 0.00417 **
[8-880] - [5-7] == 0 8.3106 4.0035 2.076 0.22580
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
# Perform Tukey's HSD test
tukey_test <- glht(model_downloads_bin, linfct = mcp(team_size_bin = "Tukey"))
# Summary of the Tukey test results
summary(tukey_test)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lm(formula = Downloads_All_Time ~ team_size_bin, data = user_commits_filtered2)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
[2] - [1] == 0 98092 2685882 0.037 1.000
[3-4] - [1] == 0 -201122 2498570 -0.080 1.000
[5-7] - [1] == 0 1193374 2685882 0.444 0.992
[8-880] - [1] == 0 13768280 2357961 5.839 <1e-05 ***
[3-4] - [2] == 0 -299214 2106979 -0.142 1.000
[5-7] - [2] == 0 1095282 2326042 0.471 0.990
[8-880] - [2] == 0 13670188 1938167 7.053 <1e-05 ***
[5-7] - [3-4] == 0 1394496 2106979 0.662 0.963
[8-880] - [3-4] == 0 13969402 1668946 8.370 <1e-05 ***
[8-880] - [5-7] == 0 12574906 1938167 6.488 <1e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
user_commits_filtered3$team_size_bin <- as.factor(user_commits_filtered3$team_size_bin)
# Perform regression analysis using team_size_bin
model_stargazer_bin <- lm(stargazer_count_normalized ~ team_size_bin, data = user_commits_filtered3)
model_fork_bin <- lm(fork_count_normalized ~ team_size_bin, data = user_commits_filtered3)
model_downloads_bin <- lm(Downloads_Normalized ~ team_size_bin, data = user_commits_filtered3)
model_revdep_bin <- lm(reverse_dep_normalized ~ team_size_bin, data = user_commits_filtered3)
# Tidy the models
tidy_stargazer_bin <- tidy(model_stargazer_bin, conf.int = TRUE)
tidy_fork_bin <- tidy(model_fork_bin, conf.int = TRUE)
tidy_downloads_bin <- tidy(model_downloads_bin, conf.int = TRUE)
tidy_revdep_bin <- tidy(model_revdep_bin, conf.int = TRUE)
# Combine the tidied data
tidy_combined_bin <- bind_rows(
tidy_stargazer_bin %>% mutate(model = "Stargazer Count Normalized"),
tidy_fork_bin %>% mutate(model = "Fork Count Normalized"),
tidy_downloads_bin %>% mutate(model = "Downloads Normalized"),
tidy_revdep_bin %>% mutate(model = "Rev Dep Normalized")
)
# Filter out the intercept terms
tidy_combined_bin <- tidy_combined_bin %>% filter(term != "(Intercept)")
# Create the plot
ggplot(tidy_combined_bin, aes(x = term, y = estimate, ymin = conf.low, ymax = conf.high, color = model)) +
geom_pointrange(size = 1.2) +
geom_point(size = 3) +
coord_flip() +
facet_wrap(~ model, scales = "free_y") +
labs(title = "Regression Coefficients for Team Size Bins",
x = "Team Size Bin",
y = "Coefficient Estimate") +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold"),
axis.text = element_text(size = 10),
strip.text = element_text(size = 12, face = "bold"),
legend.position = "bottom",
legend.title = element_blank()
) +
scale_color_brewer(palette = "Set1")# Perform Tukey's HSD test
tukey_test <- glht(model_stargazer_bin, linfct = mcp(team_size_bin = "Tukey"))
# Summary of the Tukey test results
summary(tukey_test)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lm(formula = stargazer_count_normalized ~ team_size_bin, data = user_commits_filtered3)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
[2] - [1] == 0 2.077 5.494 0.378 0.996
[3-4] - [1] == 0 3.696 5.111 0.723 0.950
[5-6] - [1] == 0 7.903 5.494 1.438 0.596
[7-58] - [1] == 0 34.909 4.900 7.124 <1e-04 ***
[3-4] - [2] == 0 1.620 4.310 0.376 0.996
[5-6] - [2] == 0 5.826 4.758 1.225 0.732
[7-58] - [2] == 0 32.833 4.058 8.092 <1e-04 ***
[5-6] - [3-4] == 0 4.206 4.310 0.976 0.863
[7-58] - [3-4] == 0 31.213 3.521 8.864 <1e-04 ***
[7-58] - [5-6] == 0 27.006 4.058 6.656 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
# Perform Tukey's HSD test
tukey_test <- glht(model_fork_bin, linfct = mcp(team_size_bin = "Tukey"))
# Summary of the Tukey test results
summary(tukey_test)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lm(formula = fork_count_normalized ~ team_size_bin, data = user_commits_filtered3)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
[2] - [1] == 0 0.2323 0.9737 0.239 0.999
[3-4] - [1] == 0 0.7922 0.9058 0.875 0.904
[5-6] - [1] == 0 1.8214 0.9737 1.871 0.328
[7-58] - [1] == 0 7.0720 0.8685 8.143 <1e-04 ***
[3-4] - [2] == 0 0.5599 0.7639 0.733 0.947
[5-6] - [2] == 0 1.5891 0.8433 1.884 0.321
[7-58] - [2] == 0 6.8397 0.7192 9.511 <1e-04 ***
[5-6] - [3-4] == 0 1.0292 0.7639 1.347 0.655
[7-58] - [3-4] == 0 6.2798 0.6241 10.062 <1e-04 ***
[7-58] - [5-6] == 0 5.2506 0.7192 7.301 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
# Perform Tukey's HSD test
tukey_test <- glht(model_revdep_bin, linfct = mcp(team_size_bin = "Tukey"))
# Summary of the Tukey test results
summary(tukey_test)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lm(formula = reverse_dep_normalized ~ team_size_bin, data = user_commits_filtered3)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
[2] - [1] == 0 -0.18017 0.45490 -0.396 0.9946
[3-4] - [1] == 0 -0.05556 0.42317 -0.131 0.9999
[5-6] - [1] == 0 0.77634 0.45490 1.707 0.4239
[7-58] - [1] == 0 0.65605 0.40571 1.617 0.4801
[3-4] - [2] == 0 0.12462 0.35685 0.349 0.9967
[5-6] - [2] == 0 0.95651 0.39395 2.428 0.1066
[7-58] - [2] == 0 0.83623 0.33596 2.489 0.0922 .
[5-6] - [3-4] == 0 0.83189 0.35685 2.331 0.1331
[7-58] - [3-4] == 0 0.71161 0.29157 2.441 0.1033
[7-58] - [5-6] == 0 -0.12028 0.33596 -0.358 0.9964
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
# Perform Tukey's HSD test
tukey_test <- glht(model_downloads_bin, linfct = mcp(team_size_bin = "Tukey"))
# Summary of the Tukey test results
summary(tukey_test)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lm(formula = Downloads_Normalized ~ team_size_bin, data = user_commits_filtered3)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
[2] - [1] == 0 16770 245757 0.068 1.000
[3-4] - [1] == 0 -19646 228618 -0.086 1.000
[5-6] - [1] == 0 110083 245757 0.448 0.991
[7-58] - [1] == 0 1035129 219187 4.723 3.01e-05 ***
[3-4] - [2] == 0 -36416 192788 -0.189 1.000
[5-6] - [2] == 0 93313 212832 0.438 0.992
[7-58] - [2] == 0 1018359 181504 5.611 < 1e-05 ***
[5-6] - [3-4] == 0 129729 192788 0.673 0.961
[7-58] - [3-4] == 0 1054775 157522 6.696 < 1e-05 ***
[7-58] - [5-6] == 0 925047 181504 5.097 < 1e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
user_commits_filtered4$team_size_bin <- as.factor(user_commits_filtered4$team_size_bin)
# Perform regression analysis using team_size_bin
model_stargazer_bin <- lm(stargazer_count ~ team_size_bin, data = user_commits_filtered4)
model_fork_bin <- lm(fork_count ~ team_size_bin, data = user_commits_filtered4)
model_downloads_bin <- lm(Downloads_All_Time ~ team_size_bin, data = user_commits_filtered4)
model_revdep_bin <- lm(Reverse_Depends_Count ~ team_size_bin, data = user_commits_filtered4)
# Tidy the models
tidy_stargazer_bin <- tidy(model_stargazer_bin, conf.int = TRUE)
tidy_fork_bin <- tidy(model_fork_bin, conf.int = TRUE)
tidy_downloads_bin <- tidy(model_downloads_bin, conf.int = TRUE)
tidy_revdep_bin <- tidy(model_revdep_bin, conf.int = TRUE)
# Combine the tidied data
tidy_combined_bin <- bind_rows(
tidy_stargazer_bin %>% mutate(model = "Stargazer Count"),
tidy_fork_bin %>% mutate(model = "Fork Count"),
tidy_downloads_bin %>% mutate(model = "Downloads"),
tidy_revdep_bin %>% mutate(model = "Rev Dep")
)
# Filter out the intercept terms
tidy_combined_bin <- tidy_combined_bin %>% filter(term != "(Intercept)")
# Create the plot
ggplot(tidy_combined_bin, aes(x = term, y = estimate, ymin = conf.low, ymax = conf.high, color = model)) +
geom_pointrange(size = 1.2) +
geom_point(size = 3) +
coord_flip() +
facet_wrap(~ model, scales = "free_y") +
labs(title = "Regression Coefficients for Team Size Bins",
x = "Team Size Bin",
y = "Coefficient Estimate") +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
axis.title.x = element_text(size = 12, face = "bold"),
axis.title.y = element_text(size = 12, face = "bold"),
axis.text = element_text(size = 10),
strip.text = element_text(size = 12, face = "bold"),
legend.position = "bottom",
legend.title = element_blank()
) +
scale_color_brewer(palette = "Set1")# Perform Tukey's HSD test
tukey_test <- glht(model_stargazer_bin, linfct = mcp(team_size_bin = "Tukey"))
# Summary of the Tukey test results
summary(tukey_test)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lm(formula = stargazer_count ~ team_size_bin, data = user_commits_filtered4)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
[2] - [1] == 0 17.24 46.46 0.371 0.996
[3-4] - [1] == 0 29.94 43.22 0.693 0.957
[5-6] - [1] == 0 64.85 46.46 1.396 0.624
[7-58] - [1] == 0 309.56 41.44 7.470 <1e-04 ***
[3-4] - [2] == 0 12.71 36.45 0.349 0.997
[5-6] - [2] == 0 47.61 40.24 1.183 0.756
[7-58] - [2] == 0 292.32 34.32 8.519 <1e-04 ***
[5-6] - [3-4] == 0 34.91 36.45 0.958 0.871
[7-58] - [3-4] == 0 279.61 29.78 9.389 <1e-04 ***
[7-58] - [5-6] == 0 244.71 34.32 7.131 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
# Perform Tukey's HSD test
tukey_test <- glht(model_fork_bin, linfct = mcp(team_size_bin = "Tukey"))
# Summary of the Tukey test results
summary(tukey_test)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lm(formula = fork_count ~ team_size_bin, data = user_commits_filtered4)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
[2] - [1] == 0 2.280 9.277 0.246 0.999
[3-4] - [1] == 0 7.289 8.630 0.845 0.914
[5-6] - [1] == 0 16.253 9.277 1.752 0.396
[7-58] - [1] == 0 65.752 8.274 7.947 <1e-04 ***
[3-4] - [2] == 0 5.009 7.278 0.688 0.958
[5-6] - [2] == 0 13.973 8.034 1.739 0.404
[7-58] - [2] == 0 63.472 6.852 9.264 <1e-04 ***
[5-6] - [3-4] == 0 8.964 7.278 1.232 0.727
[7-58] - [3-4] == 0 58.463 5.946 9.832 <1e-04 ***
[7-58] - [5-6] == 0 49.498 6.852 7.224 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
# Perform Tukey's HSD test
tukey_test <- glht(model_revdep_bin, linfct = mcp(team_size_bin = "Tukey"))
# Summary of the Tukey test results
summary(tukey_test)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lm(formula = Reverse_Depends_Count ~ team_size_bin, data = user_commits_filtered4)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
[2] - [1] == 0 -0.4311 3.4936 -0.123 0.9999
[3-4] - [1] == 0 0.4615 3.2499 0.142 0.9999
[5-6] - [1] == 0 4.2622 3.4936 1.220 0.7344
[7-58] - [1] == 0 7.6828 3.1158 2.466 0.0974 .
[3-4] - [2] == 0 0.8926 2.7406 0.326 0.9975
[5-6] - [2] == 0 4.6933 3.0255 1.551 0.5226
[7-58] - [2] == 0 8.1139 2.5802 3.145 0.0147 *
[5-6] - [3-4] == 0 3.8007 2.7406 1.387 0.6300
[7-58] - [3-4] == 0 7.2213 2.2393 3.225 0.0115 *
[7-58] - [5-6] == 0 3.4206 2.5802 1.326 0.6692
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
# Perform Tukey's HSD test
tukey_test <- glht(model_downloads_bin, linfct = mcp(team_size_bin = "Tukey"))
# Summary of the Tukey test results
summary(tukey_test)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lm(formula = Downloads_All_Time ~ team_size_bin, data = user_commits_filtered4)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
[2] - [1] == 0 98092 2054842 0.048 1.000
[3-4] - [1] == 0 -201122 1911539 -0.105 1.000
[5-6] - [1] == 0 1193374 2054842 0.581 0.977
[7-58] - [1] == 0 9449024 1832678 5.156 <1e-05 ***
[3-4] - [2] == 0 -299214 1611951 -0.186 1.000
[5-6] - [2] == 0 1095282 1779546 0.615 0.972
[7-58] - [2] == 0 9350931 1517602 6.162 <1e-05 ***
[5-6] - [3-4] == 0 1394496 1611951 0.865 0.907
[7-58] - [3-4] == 0 9650145 1317087 7.327 <1e-05 ***
[7-58] - [5-6] == 0 8255650 1517602 5.440 <1e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)